**Mathematics of Planet Earth 10**

Bertrand Chapron · Dan Crisan · Darryl Holm · Etienne Mémin · Anna Radomska  *Editors*

# Stochastic Transport in Upper Ocean Dynamics

STUOD 2021 Workshop, London, UK, September 20–23

## **Mathematics of Planet Earth**

Volume 10

#### **Series Editors**

Dan Crisan, Imperial College London, London, UK Ken Golden, University of Utah, Salt Lake City, UT, USA Darryl D. Holm, Imperial College London, London, UK Mark Lewis, University of Alberta, Edmonton, AB, Canada Yasumasa Nishiura, Tohoku University, Sendai, Miyagi, Japan Joseph Tribbia, National Center for Atmospheric Research, Boulder, CO, USA Jorge Passamani Zubelli, Instituto de Matemática Pura e Aplicada, Rio de Janeiro, Brazil

This series provides a variety of well-written books of a variety of levels and styles, highlighting the fundamental role played by mathematics in a huge range of planetary contexts on a global scale. Climate, ecology, sustainability, public health, diseases and epidemics, management of resources and risk analysis are important elements. The mathematical sciences play a key role in these and many other processes relevant to Planet Earth, both as a fundamental discipline and as a key component of cross-disciplinary research. This creates the need, both in education and research, for books that are introductory to and abreast of these developments.

Springer's MoPE series will provide a variety of such books, including monographs, textbooks, contributed volumes and briefs suitable for users of mathematics, mathematicians doing research in related applications, and students interested in how mathematics interacts with the world around us. The series welcomes submissions on any topic of current relevance to the international Mathematics of Planet Earth effort, and particularly encourages surveys, tutorials and shorter communications in a lively tutorial style, offering a clear exposition of broad appeal.

#### **Responsible Editors:**

Martin Peters, Heidelberg (martin.peters@springer.com) Robinson dos Santos, São Paulo (robinson.dossantos@springer.com)

#### **Additional Editorial Contacts:**

Donna Chernyk, New York (donna.chernyk@springer.com) Masayuki Nakamura, Tokyo (masayuki.nakamura@springer.com) Bertrand Chapron • Dan Crisan • Darryl Holm • Etienne Mémin • Anna Radomska Editors

## Stochastic Transport in Upper Ocean Dynamics

STUOD 2021 Workshop, London, UK, September 20–23

*Editors* Bertrand Chapron Ifremer – Institut Français de Recherche pour l'Exploitation de la Mer Plouzané, France

Darryl Holm Imperial College London London, UK

Anna Radomska Rennes, France Imperial College London London, UK

Dan Crisan Imperial College London London, UK

Etienne Mémin Campus Universitaire de Beaulieu Inria – Institut National de Recherche en Sciences et Technologies du Numérique

This work was supported by Horizon 2020 Framework Programme (856408)

ISSN 2524-4264 ISSN 2524-4272 (electronic) Mathematics of Planet Earth ISBN 978-3-031-18987-6 ISBN 978-3-031-18988-3 (eBook) https://doi.org/10.1007/978-3-031-18988-3

Mathematics Subject Classification: 60Hxx, 60H17, 70L10, 35R60, 37M05, 37-11, 35Qxx, 65Pxx, 00B25

© The Editor(s) (if applicable) and The Author(s) 2023. This book is an open access publication.

**Open Access** This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## **Preface**

This volume contains the Proceedings of the 2nd Stochastic Transport in Upper Ocean Dynamics Workshop held on 20–23 September 2021. After the success of the first workshop, the STUOD Principal Investigators: Prof. Dan Crisan (ICL), Prof. Bertrand Chapron (IFREMER), Prof. Darryl Holm (ICL) and Prof. Etienne Mémin (INRIA) were delighted to be back with another educational and inspirational event. "Stochastic Transport in Upper Ocean Dynamics" (STUOD) project is supported by an ERC Synergy Grant, led by Imperial College London, National Institute for Research in Digital Science and Technology (INRIA) and the French Research Institute for Exploitation of the Sea (IFREMER). The project aims to deliver new capabilities for assessing variability and uncertainty in upper ocean dynamics and provide decision makers a means of quantifying the effects of local patterns of sea level rise, heat uptake, carbon storage and change of oxygen content and pH in the ocean. The project will make use of multimodal data and will enhance the scientific understanding of marine debris transport, tracking of oil spills and accumulation of plastic in the sea.

As in the previous year, the 2nd STUOD Annual Workshop 2021 focused on a range of fundamental topical areas, including:


Each chapter in the present volume illustrates one or several of these topical areas. Many chapters offer new mathematical frameworks that are intended to enhance future research in the STUOD project.

The event brought together 65 participants from 11 countries: UK 28, France 22, USA 1, Canada 1, Australia 1, Czech Republic 1, Germany 4, Italy 4, Ireland 1, South Africa 1 and Switzerland 1. Moreover, the workshop was well attended by early-career academics, post-graduate students, industry representatives (WatsonMarlow Fluid Technology Group, OceanScope), senior members of the community and invited guests.

The scientific program of this 4-day hybrid event included invited presentations by STUOD Advisory Board Members: Prof Alberto Carrassi (University of Reading, NCEO), Prof Franco Flandoli (Scuola Normale Superiore) and Prof Sebastian Reich (University of Potsdam), Dr Eniko Székely (École Polytechnique Fédérale de Lausanne, Swiss Data Science Center), individual presentations by the STUOD Principal Investigators and post-doctoral Researchers, snapshot presentations and demos. The speakers included leading mid-career and senior researchers as well as early-career researchers. Moreover, the forum yielded opportunities for investigators at an early stage of their career to have discussions with established scientist, fostering potential future research collaborations, networking as well as inclusion and training of the next generation of researchers.

The photograph above shows some participants attending the event in person during a break between lectures.

Most of the lectures were video-recorded and may be viewed on the STUOD YouTube channel.

The following is a brief description of the 19 contributions included in the proceedings:

The submitted manuscripts include the paper by **Dan Crisan and Prince Romeo Mensah**, entitled "**Blow-up of Strong Solutions of the Thermal Quasi-Geostrophic Equation**". This paper concerns the system of coupled equations that Preface vii

governs the evolution of the buoyancy and potential vorticity of a fluid. This system has been shown in recent work of the authors and their collaborators to possess a local in time solution. In this paper, the authors give a characterization of the blow-up of solutions of the system in the spirit of the classical Beale–Kato–Majda blow-up criterion for the solution of the Euler equation.

The contribution of **Arnaud Debussche, Berenger Hug, and Etienne Mémin**, entitled "**Modelling Under Location Uncertainty: A Convergent Large-Scale Representation of the Navier-Stokes Equations**", introduces martingale solutions for 2D and 3D stochastic Navier-Stokes equations in the framework of the modelling under location uncertainty (LU). Such solutions are unique when the spatial dimension is 2D. The authors also prove that, if the noise intensity goes to zero, these solutions converge to a solution of the deterministic Navier-Stokes equation.

**Evgueni Dinvay** considers in the paper "**A Stochastic Benjamin-Bona-Mahony Type Equation**" a particular nonlinear dispersive stochastic equation recently introduced as a model describing surface water waves under location uncertainty. The corresponding noise term is introduced through a Hamiltonian formulation, which guarantees the energy conservation of the flow. The author shows that the initial-value problem has a unique solution.

**Benjamin Dufée, Etienne Mémin, and Dan Crisan** investigate in the paper "**Observation-Based Noise Calibration: An Efficient Dynamics for the Ensemble Kalman Filter**" the calibration of the stochastic noise in order to guide its realizations towards the observational data used for the assimilation. This is done in the context of the stochastic parametrization under location uncertainty (LU) and data assimilation. The new methodology is mathematically justified by the use of the Girsanov theorem and yields significant improvements in the experiments carried out on the surface quasi-geostrophic (SQG) model, when applied to ensemble Kalman filters. The test case studied in the paper shows improvements of the peak MSE from 85% to 93%.

The paper by **Camilla Fiorini, Pierre-Marie Boulvard, Long Li, and Etienne Mémin**, entitled "**A Two-Step Numerical Scheme in Time for Surface Quasi Geostrophic Equations Under Location Uncertainty**", considers the surface quasi-geostrophic (SQG) system under location uncertainty (LU) and proposes a Milstein-type scheme for these equations, which is then used in a multi-step method. The SQG system considered in the paper consists of one stochastic partial differential equation, which models the stochastic transport of the buoyancy, and a linear operator linking the velocity and the buoyancy. In the LU setting, the Euler-Maruyama scheme converges with weak order 1 and strong order 0.5. The authors develop higher order schemes in time, based on a Milstein-type scheme in a multistep framework. They compare different kinds of Milstein schemes. The scheme with the best performance is then included in the two-step scheme. Finally, they show how their two-step scheme decreases the error in comparison to other multistep schemes.

The contribution of **Franco Flandoli and Eliseo Luongo**, entitled "**The Dissipation Properties of Transport Noise**", presents in a compact way the latest results about the dissipation properties of transport noise in fluid mechanics. Motivated by the fact that transport noise is natural in a passive scalar equation for the heat diffusion and transport, the authors introduce several results about enhanced dissipation due to the noise. Rigorous statements are matched with numerical experiments to understand that the sufficient conditions stated are not yet optimal but give a first useful indication.

**Daniel Goodair** presents in the paper "**Existence and Uniqueness of Maximal Solutions to a 3D Navier-Stokes Equation with Stochastic Lie Transport**" a criterion for showing that an abstract SPDE possesses a unique maximal strong solution. This is then applied to a 3D stochastic Navier-Stokes equation. Inspired by the classical work of Kato and Lai, the author provides a comparable result in the stochastic case applicable to a variety of noise structures such as additive, multiplicative and transport. In particular, the criterion is designed to fit viscous fluid dynamics models with stochastic advection by lie transport. Its application to the incompressible Navier-Stokes equation matches the existence and uniqueness result of the deterministic theory.

**Darryl D. Holm, Ruiao Hu, and Oliver D. Street** present in "**Coupling of Waves to Sea Surface Currents Via Horizontal Density Gradients**" a set of mathematical models and numerical simulations motivated by satellite observations of horizontal sea surface fluid motions that show the close coordination between thermal fronts and the vertical motion of waves or, after an approximation, the slowly varying envelope of the rapidly oscillating waves. This coordination of fluid movements with wave envelopes occurs most dramatically when strong horizontal buoyancy gradients are present, e.g., at thermal fronts. The nonlinear models of this coordinated movement presented in the paper may provide future opportunities for the optimal design of satellite imagery that could simultaneously capture the dynamics of both waves and currents directly. The models derived in the paper appear first in their un-approximated form, then again with a slowly varying envelope (SVE) approximation using the WKB approach. The WKB wave-currentbuoyancy interaction model derived by the authors for a free surface with horizontal buoyancy gradients indicates that the mechanism for these correlations is the ponderomotive force of the slowly varying envelope of rapidly oscillating waves acting on the surface currents via the horizontal buoyancy gradient. In this model, the buoyancy gradient appears explicitly in the WKB wave momentum, which in turn generates density-weighted potential vorticity whenever the buoyancy gradient is not aligned with the wave-envelope gradient.

The contribution of **Ruiao Hu and Stuart Patching**, entitled "**Variational Stochastic Parameterisations and Their Applications to Primitive Equation Models**", presents a numerical investigation into the stochastic parameterizations of the primitive equations (PE) using the stochastic advection by lie transport (SALT) and stochastic forcing by lie transport (SFLT) frameworks. These frameworks were chosen due to their structure-preserving introduction of stochasticity, which decomposes the transport velocity and fluid momentum into their drift and stochastic parts, respectively. In this paper, the authors develop a new calibration methodology to implement the momentum decomposition of SFLT, and they compare this methodology with the Lagrangian path methodology implemented for SALT. The resulting stochastic primitive equations are then integrated numerically using a modification of the FESOM2 code. For certain choices of the stochastic parameters, the authors show that SALT causes an increase in the eddy kinetic energy field and an improvement in the spatial spectrum. SFLT also shows improvements in these areas, though to a lesser extent. The SALT approach, however, produces an excessive downwards diffusion of temperature, compared to high-resolution deterministic simulations.

The paper by **Oana Lang and Wei Pan**, entitled "**A Pathwise Parameterisation for Stochastic Transport**", sets the stage for a new probabilistic approach to effectively calibrate in a pathwise manner a general class of stochastic nonlinear fluid dynamics models. The authors focus on a 2D Euler SALT equation, showing that the driving stochastic parameter can be calibrated in an optimal way to match a set of given data. Moreover, they show that this model is robust with respect to the stochastic parameters.

The work by **Long Li, Etienne Mémin, and Gilles Tissot**, entitled "**Stochastic Parameterization with Dynamic Mode Decomposition**", considers a physical stochastic parameterization to account for the effects of the unresolved small scale on the large-scale flow dynamics. This random model is based on a stochastic transport principle, which ensures a strong energy conservation. The dynamic mode decomposition (DMD) is performed on high-resolution data to learn a basis of the unresolved velocity field, on which the stochastic transport velocity is expressed. Time-harmonic property of DMD modes allows the authors to perform a clean separation between time-differentiable and time-decorrelated components. The corresponding random scheme is assessed on a quasi-geostrophic (QG) model.

The paper by **Alexander Lobbe**, entitled "**Deep Learning for the Benes Filter**", concerns the filtering problem, in other words, the optimal estimation of a hidden state given partial and noisy observations. Filtering is extensively studied in the theoretical and applied mathematical literature. One of the central challenges in filtering today is the numerical approximation of the optimal filter. The author presents a brief study of a new numerical method based on the mesh-free neural network representation of the density of the solution of the filtering problem achieved by deep learning. Based on the classical SPDE splitting method, the algorithm introduced includes a recursive normalization procedure to recover the normalized conditional distribution of the signal process. The present work uses the Benes model as a benchmark: within the analytically tractable setting of the Benes filter, the author discusses the role of nonlinearity in the filtering model equations for the choice of the domain of the neural network. Further, he presents the first study of the neural network method with an adaptive domain for the Benes model.

Data assimilation techniques are the state-of-the-art approaches in the reconstruction of a spatio-temporal geophysical state such as the atmosphere or the ocean. These methods rely on a numerical model that fills the spatial and temporal gaps in the observational network. Unfortunately, limitations regarding the uncertainty of the state estimate may arise when considering the restriction of the data assimilation problems to a small subset of observations, as encountered for instance in ocean surface reconstruction. These limitations motivated the exploration of reconstruction techniques that do not rely on numerical models. In this context, the increasing availability of geophysical observations and model simulations motivates the exploitation of machine learning tools to tackle the reconstruction of ocean surface variables. In the paper "**End-to-End Kalman Filter in a High Dimensional Linear Embedding of the Observations**", by **Said Ouala, Pierre Tandeo, Bertrand Chapron, Fabrice Collard and Ronan Fablet**, the authors formulate sea surface spatio-temporal reconstruction problems as state space Bayesian smoothing problems with unknown augmented linear dynamics. The solution of the smoothing problem, given by the Kalman smoother, is written in a differentiable framework which allows, given some training data, to optimize the parameters of the state space model.

Large-scale weather can often be successfully described using a small amount of patterns. A statistical description of re-analysed pressure fields identifies these recurring patterns with clusters in state space, also called *regimes*. Recently, these weather regimes have been described through instantaneous, local indicators of dimension and persistence, borrowed from dynamical systems theory and extreme value theory. Using similar indicators and going further, **Paul Platzer, Bertrand Chapron, and Pierre Tandeo** focus in the paper "**Dynamical Properties of Weather Regime Transitions**" on weather regime transitions. They use sixty years of winter-time sea-level pressure reanalysis data centred on the North-Atlantic Ocean and western Europe. These experiments reveal regime-dependent behaviours of dimension and persistence near transitions, although in average one observes an increase of dimension and a decrease of persistence near transitions. The effect of transition on persistence is stronger and lasts longer than on dimension. The findings confirm the relevance of such dynamical indicators for the study of large-scale weather regimes and reveal their potential to be used for both the understanding and detection of weather regime transitions.

Standard maximum likelihood or Bayesian approaches to parameter estimation for stochastic differential equations are known not to be robust to perturbations in the continuous-in-time data. In the paper "**Frequentist Perspective on Robust Parameter Estimation Using the Ensemble Kalman Filter**", **Sebastian Reich** gives a rather elementary explanation of this observation in the context of continuoustime parameter estimation using an ensemble Kalman filter. The author employs the frequentist perspective to shed new light on two robust estimation techniques; namely subsampling the data and rough path corrections. He also illustrates the findings through a simple numerical experiment.

The contribution of **Valentin Resseguier, Erwan Hascoet and Bertrand Chapron**, entitled "**Random Ocean Swell-Rays: A Stochastic Framework**", concerns swell systems that radiate across ocean basins. Far from their sources, emerging surface waves have low steepness characteristics, with very slow amplitude variations. Swell propagation then closely follows principles of geometrical optics, that is, the eikonal approximation to the wave equation, with a constant wave period along geodesics, when following a wave packet at its group speed. The phase averaged evolution of quasi-linear wave fields is then dominated by interactions with underlying current and/or topography changes. Comparable to the propagation of light in a slowly varying medium, over many wavelengths, cumulative effects can lead to refraction. This opens the possibility of using surface swell waves as probes to estimate turbulence along their propagating path.

**Louis Thiry, Long Li and Etienne Mémin** present in the paper, entitled "**Modified (Hyper-) Viscosity for Coarse-Resolution Ocean Models**", a simple parameterization for coarse-resolution ocean models. To replace computationally expensive high-resolution ocean models, the authors develop a computationally cheap parameterization for coarse-resolution models based solely on the modification of the viscosity term in advection equations. The parametrization is meant to reproduce the mean quantities like pressure, velocity or vorticity computed from a high-resolution reference solution or using observations. The authors test this new parameterization on a double-gyre quasi-geostrophic model in the eddy-permitting regime. The results show that the proposed scheme significantly improves the energy statistics and the intrinsic variability on the coarse mesh. This method will serve as a deterministic basis model for coarse-resolution stochastic parameterizations in future works.

Resolving numerically all the scale interactions of ocean dynamics in a highresolution realistic configuration is today far beyond reach, and only large-scale representations can be afforded. **Francesco L. Tucciarone, Etienne Mémin and Long Li** study in the paper "**Primitive Equations Under Location Uncertainty: Analytical Description and Model Development**" a stochastic parameterization of the ocean primitive equations derived within the modelling under location uncertainty framework. Numerical assessments built with the NEMO core's code are provided for a double-gyres configuration.

The paper by **Yicun Zhen, Bertrand Chapron and Etienne Mémin**, entitled "**Bridging Koopman Operator and Time-Series Auto-Correlation Based Hilbert-Schmidt Operator**", considers Hilbert-Schmidt operators associated with stationary continuous-time processes. A Hilbert space and a (time-shift) continuous one-parameter semigroup of isometries are introduced and analysed. Under some technical assumptions, the continuous one-parameter semigroup is shown to be equivalent, almost surely, to the classical Koopman one-parameter semigroup.

Finally, the STUOD Organizing Committee would like to acknowledge the financial and in-kind support received from several sources: the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Programme (ERC, Grant Agreement No 856408) – for providing funds to cover the travel expenses of the invited speakers, catering costs and administrative support; Imperial College London – for offering the conference venue.

STUOD Organizing Committee: Prof. Bertrand Chapron (IFREMER) Prof. Dan Crisan (ICL) Prof. Darryl Holm (ICL) Prof. Etienne Mémin (INRIA) Dr Anna Radomska (ICL) May 2022

Plouzané, France Bertrand Chapron London, UK Dan Crisan London, UK Darryl Holm Rennes, France Etienne Mémin

London, UK Anna Radomska May 2022

## **Organization**

#### **Program Chairs**

Etienne Mémin Inria, France Bertrand Chapron Ifremer, France

Dan Crisan Imperial College London Anna Radomska Imperial College London Darryl Holm Imperial College, United Kingdom

## **Contents**



## **Blow-Up of Strong Solutions of the Thermal Quasi-Geostrophic Equation**

**Dan Crisan and Prince Romeo Mensah**

**Abstract** The Thermal Quasi-Geostrophic (TQG) equation is a coupled system of equations that governs the evolution of the buoyancy and the potential vorticity of a fluid. It has a local in time solution as proved in Crisan et al. (Theoretical and computational analysis of the thermal quasi-geostrophic model. Preprint arXiv:2106.14850, 2021). In this paper, we give a criterion for the blow-up of solutions to the Thermal Quasi-Geostrophic equation, in the spirit of the classical Beale–Kato–Majda blow-up criterion (cf. Beale et al., Comm. Math. Phys. 94(1), 61–66, 1984) for the solution of the Euler equation.

**Keywords** Blow-up criterion · Thermal Quasi-Qeostrophic equation · Modified Helmholtz operator

#### **1 Introduction**

The Thermal Quasi-Geostrophic (TQG) equation is a coupled system of equations governed by the evolution of the buoyancy *<sup>b</sup>* : *(t,* **<sup>x</sup>***)* ∈ [0*, T* ] × <sup>R</sup><sup>2</sup> → *b(t,* **<sup>x</sup>***)* <sup>∈</sup> <sup>R</sup> and the potential vorticity *<sup>q</sup>* : *(t,* **<sup>x</sup>***)* ∈ [0*, T* ] × <sup>R</sup><sup>2</sup> → *q(t,* **<sup>x</sup>***)* <sup>∈</sup> <sup>R</sup> in the following way:

$$
\partial\_l b + (\mathbf{u} \cdot \nabla) b = 0,\tag{1}
$$

$$
\partial\_t q + (\mathbf{u} \cdot \nabla)(q - b) = -(\mathbf{u}\_h \cdot \nabla)b,\tag{2}
$$

$$b(0, \mathbf{x}) = b\_0(\mathbf{x}), \qquad q(0, \mathbf{x}) = q\_0(\mathbf{x}), \tag{3}$$

D. Crisan · P. R. Mensah (-)

Department of Mathematics, Imperial College, London, UK e-mail: d.crisan@imperial.ac.uk; p.mensah@imperial.ac.uk

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_1

where

$$\mathbf{u} = \nabla^{\perp} \psi, \qquad \mathbf{u}\_h = \frac{1}{2} \nabla^{\perp} h, \qquad q = (\Delta - 1)\psi + f. \tag{4}$$

Here, *<sup>ψ</sup>* : *(t,* **<sup>x</sup>***)* ∈ [0*, T* ] × <sup>R</sup><sup>2</sup> → *ψ(t,* **<sup>x</sup>***)* <sup>∈</sup> <sup>R</sup> is the streamfunction, *<sup>h</sup>* : **<sup>x</sup>** <sup>∈</sup> <sup>R</sup><sup>2</sup> → *h(***x***)* <sup>∈</sup> <sup>R</sup> is the spatial variation around a constant bathymetry profile and *<sup>f</sup>* : **<sup>x</sup>** <sup>∈</sup> <sup>R</sup><sup>2</sup> → *f (***x***)* <sup>∈</sup> <sup>R</sup> is the Coriolis parameter. Since we are working on the whole space, we can supplement our system with the far-field condition

$$\lim\_{|\mathbf{x}| \to \infty} (b(\mathbf{x}), \mathbf{u}(\mathbf{x})) = 0.$$

Our given set of data is *(***u***h, f, b*0*, q*0*)* with regularity class:

$$\mathbf{u}\_h \in W^{3,2}\_{\text{div}}(\mathbb{R}^2; \mathbb{R}^2), \quad f \in W^{2,2}(\mathbb{R}^2), \quad b\_0 \in W^{3,2}(\mathbb{R}^2), \quad q\_0 \in W^{2,2}(\mathbb{R}^2). \tag{5}$$

The TQG equation models the dynamics of a submesoscale geophysical fluid in thermal geostrophic balance, for which the Rossby number, the Froude number and the stratification parameter are all of the same asymptotic order. For a historical overview, modelling and other issues pertaining to the TQG equation, we refer the reader to [4].

In the following, we are interested in *strong solutions* of the system (1)–(4) which can naturally be defined in terms of just *b* and *q* although the unknowns in the evolutionary Eqs. (1)–(2) are *b*, *q* and **u**. This is because for a given *f* , one can recover the velocity **u** from the vorticity *q* by solving the equation

$$\mathbf{u} = \nabla^{\perp} (\Delta - 1)^{-1} (q - f)$$

derived from (4). Also note that a consequence of the equation **u** = ∇⊥*ψ* in (4) is that div**u** = 0. This means that the fluid is incompressible. With these information in hand, we now make precise, the notion of a strong solution.

**Definition 1 (Local Strong Solution)** Let *(***u***h, f, b*0*, q*0*)* be of regularity class (5). For some *T >* 0, we call the triple *(b, q, T )* a *strong solution* to the system (1)–(4) if the following holds:

– The buoyancy *<sup>b</sup>* satisfies *<sup>b</sup>* <sup>∈</sup> *C(*[0*, T* ]; *<sup>W</sup>*3*,*2*(*R2*))* and the equation

$$b(t) = b\_0 - \int\_0^t \text{div}(b\mathbf{u}) \,\mathrm{d}\tau,$$

holds for all *t* ∈ [0*, T* ];

– the potential vorticity *<sup>q</sup>* satisfies *<sup>q</sup>* <sup>∈</sup> *C(*[0*, T* ]; *<sup>W</sup>*2*,*2*(*R2*))* and the equation

$$q(t) = q\_0 - \int\_0^t \left[ \text{div}((q - b)\mathbf{u}) + \text{div}(b\mathbf{u}\_h) \right] \,\mathrm{d}\tau$$

holds for all *t* ∈ [0*, T* ].

Such local strong solutions exist on a maximal time interval. We define this as follows.

**Definition 2 (Maximal Solution)** Let *(***u***h, f, b*0*, q*0*)* be of regularity class (5). For some *T >* 0, we call *(b, q, T*max*)* a *maximal solution* to the system (1)–(4) if:


$$\lim\_{T\_n \to T\_{\text{max}}} \|b(T\_n)\|\_{W^{3,2}(\mathbb{R}^2)}^2 + \|q(T\_n)\|\_{W^{2,2}(\mathbb{R}^2)}^2 = \infty. \tag{6}$$

We shall call *T*max *>* 0 the *maximal time*.

The existence of a unique local strong solution of (1)–(4) has recently been shown in [4, Theorem 2.10] on the torus. A unique maximal solutions also exist [4, Theorem 2.14] and the result also applies to the whole space [4, Remark 2.1]. We state the result here for completeness.

**Theorem 1** *For (***u***h, f, b*0*, q*0*) of regularity class* (5)*, there exist a unique maximal solution (b, q, T ) of the system* (1)*–*(4)*.*

Before we state our main result, let us first present some notations used throughout this work.

#### *1.1 Notations*

In the following, we write *F* - *G* if there exists a generic constant *c >* 0 (that may vary from line to line) such that *<sup>F</sup>* <sup>≤</sup> *c G*. Functions mapping into <sup>R</sup><sup>2</sup> are **boldfaced** (for example the velocity **u**) while those mapping into R are not (for example the buoyancy *<sup>b</sup>* and vorticity *<sup>q</sup>*). For *<sup>k</sup>* <sup>∈</sup> <sup>N</sup> ∪ {0} and *<sup>p</sup>* ∈ [1*,*∞], *<sup>W</sup>k,p(*R2*)* is the usual Sobolev space of functions mapping into R with a natural modification for functions mapping into <sup>R</sup>2. For *<sup>p</sup>* <sup>=</sup> 2, *<sup>W</sup>k,*2*(*R2*)* is a Hilbert space with inner product *u, v <sup>W</sup>k,*2*(*R2*)* <sup>=</sup> <sup>|</sup>*β*|≤*<sup>k</sup><sup>∂</sup>βu, ∂βv ,* where · *,*  denotes the standard *<sup>L</sup>*2 inner product. For general *<sup>s</sup>* <sup>∈</sup> <sup>R</sup>, we use the norm

$$\|\boldsymbol{v}\|\_{W^{s,2}(\mathbb{R}^2)} \equiv \left(\int\_{\mathbb{R}^2} \left(1 + |\xi|^2\right)^s |\widehat{v}(\xi)|^2 \,d\xi\right)^{\frac{1}{2}} \tag{7}$$

defined in frequency space. Here, *v(ξ )* denotes the Fourier coefficients of *<sup>v</sup>*. For simplicity, we write ·*s,*<sup>2</sup> for ·*Ws,*2*(*R2*)*. When *k* = *s* = 0, we get the usual *<sup>L</sup>*2*(*R2*)* space whose norm we will simply denote by ·2. A similar notation will be used for norms ·*<sup>p</sup>* of general *<sup>L</sup>p(*R2*)* spaces for any *<sup>p</sup>* ∈ [1*,*∞] as well as for the inner product ·*,* ·*k,*<sup>2</sup> := ·*,* ·*Wk,*2*(*R2*)* when *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>. Additionally, *<sup>W</sup>k,p* div *(*R2*)* represents the space of divergence-free vector-valued functions in *Wk,p(*R2*)*. With respect to differential operators, we let <sup>∇</sup><sup>0</sup> := *(∂x*<sup>1</sup> *, ∂x*<sup>2</sup> *,* <sup>0</sup>*)<sup>T</sup>* and <sup>∇</sup><sup>⊥</sup> 0 := *(*−*∂x*<sup>2</sup> *, ∂x*<sup>1</sup> *,* 0*)* be the three-dimensional extensions of the two-dimensional differential operators ∇ = *(∂x*<sup>1</sup> *, ∂x*<sup>2</sup> *)<sup>T</sup>* and <sup>∇</sup><sup>⊥</sup> := *(*−*∂x*<sup>2</sup> *, ∂x*<sup>1</sup> *)* by zero respectively. The Laplacian *Δ* = div∇ = *∂x*1*x*<sup>1</sup> + *∂x*2*x*<sup>2</sup> remains two-dimensional.

#### *1.2 Main Result*

Our main result is to give a blow-up criterion, of Beale–Kato–Majda-type [2], for a strong solution *(b, q, T )* of (1)–(4). In particular, we show the following result.

**Theorem 2** *Suppose that (b, q, T ) is a local strong solution of* (1)*–*(4)*. If*

$$\int\_{0}^{T} \left( \|q(t)\|\_{\infty} + \|\nabla b(t)\|\_{\infty} \right) \mathrm{d}t \equiv K < \infty,\tag{8}$$

*then there exists a solution (b , q , T ) with T > T , such that (b , q )* = *(b, q) on* [0*, T* ]*. Moreover, for all t* ∈ [0*, T* ]*,*

$$\|\|b(t)\|\|\_{3,2} + \|q(t)\|\|\_{2,2} \le \left[\mathbf{e} + \|\|b\_0\|\|\_{3,2} + \|q\_0\|\|\_{2,2}\right]^{\exp(cK)} \exp[cT\exp(cK)].$$

An immediate consequence of the above theorem is the following:

**Corollary 1** *Assume that (b, q, T ) is a maximal solution. If T <* ∞*, then*

$$\int\_0^T \left( \|q(t)\|\_{\infty} + \|\nabla b(t)\|\_{\infty} \right) \mathrm{d}t = \infty$$

*and in particular,*

$$\sup\_{t \uparrow T} \left( \|q(t)\|\_{\infty} + \|\nabla b(t)\|\_{\infty} \right) = \infty.$$

#### **2 Blow-Up**

We devote the entirety of this section to the proof of Theorem 2. In order to achieve our goal, we first derive a suitable exact solution for what is referred to as the modified Helmholtz equation. Some authors also call it the Screened Poisson equation [3] while others rather mistakenly call it the Helmholtz equation. Refer to [1] for the difference between the Helmholtz equation and modified Helmholtz equation.

#### *2.1 Estimate for the* **2***D Modified Helmholtz Equation or the Screened Poisson Equation*

In the following, we want to find an exact solution *<sup>ψ</sup>* : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> of

$$(\Delta - 1)\psi(\mathbf{x}) = w(\mathbf{x}), \qquad \lim\_{|\mathbf{x}| \to \infty} \psi(\mathbf{x}) = 0 \tag{9}$$

for a given function *<sup>w</sup>* <sup>∈</sup> *<sup>W</sup>*2*,*2*(*R2*)* . The corresponding two-dimensional free space Green's function *G*free*(***x***)* for (9) must therefore solve

$$\delta(\Delta - 1)G^{\text{free}}(\mathbf{x} - \mathbf{y}) = \delta(\mathbf{x} - \mathbf{y}), \qquad \lim\_{|\mathbf{x}| \to \infty} G^{\text{free}}(\mathbf{x} - \mathbf{y})(\mathbf{x}) = 0 \tag{10}$$

in the sense of distributions. Indeed, one can verify that the Green's function is given by

$$G^{\text{free}}(\mathbf{x} - \mathbf{y}) = \frac{1}{2\pi} K\_0(|\mathbf{x} - \mathbf{y}|) \tag{11}$$

see [1, Table 9.5], where

$$K\_0(z) = \int\_0^\infty \frac{e^{-\sqrt{z^2 + r^2}}}{\sqrt{z^2 + r^2}} \,\mathrm{d}r$$

is the modified Bessel function of the second kind, see equation (8.432-9), page 917 of [5] with *ν* = 0 and *x* = 1. However, since the integral above is an even function, it follows that

$$G^{\rm free}(\mathbf{x} - \mathbf{y}) = \frac{i}{4} H\_0^{(1)}(i|\mathbf{x} - \mathbf{y}|) = \frac{1}{4\pi} \int\_{\mathbb{R}} \frac{e^{-\sqrt{|\mathbf{x} - \mathbf{y}|^2 + r^2}}}{\sqrt{|\mathbf{x} - \mathbf{y}|^2 + r^2}} \,\mathrm{d}r \tag{12}$$

which is the zeroth-order Hankel function of the first kind, see equation (11.117) in [1] and equation (8.421-9) of [5] on page 915. Therefore,

$$\psi(\mathbf{x}) = \frac{1}{4\pi} \int\_{\mathbb{R}^3} \frac{e^{-|(\mathbf{x} - \mathbf{y}, -r)|}}{|(\mathbf{x} - \mathbf{y}, -r)|} w((\mathbf{y}, 0)) \,\mathrm{dydr} \tag{13}$$

$$=:\psi\left((\mathbf{x},0)\right)\tag{14}$$

where we have used the identity |**x** − **y**| <sup>2</sup> + *r*<sup>2</sup> = |*(***x***,* 0*)*−*(***y***,r)*|=|*(***x**−**y***,* −*r)*|. We can therefore view the argument of the streamfunction *ψ* as a 3*D*-vector with zero vertical component.

#### *2.2 Log-Sobolev Estimate for Velocity Gradient*

Our goal now is to find a suitable estimate for the Lipschitz norm of **u** that solves

$$\mathbf{u} = \nabla^{\perp} \psi, \qquad (\Delta - 1)\psi = w \tag{15}$$

where *<sup>w</sup>* <sup>∈</sup> *<sup>W</sup>*2*,*2*(*R2*)* is given. In particular, inspired by Beale et al. [2], we aim to show Proposition 1 below. This log-estimate is the crucial ingredient that allow us to obtain our blow-up criterion in terms of just the buoyancy gradient and the vorticity although preliminary estimate may have suggested estimating the velocity gradient as well.

**Proposition 1** *For a given <sup>w</sup>* <sup>∈</sup> *<sup>W</sup>*2*,*2*(*R2*), any* **<sup>u</sup>** *solving* (15) *satisfies*

$$\|\mathbf{u}\|\_{1,\infty} \lesssim 1 + (1 + 2\ln^+(\|w\|\_{2,2})) \|w\|\_{\infty} \tag{16}$$

*where* ln<sup>+</sup> *a* = ln *a if a* ≥ 1 *and* ln<sup>+</sup> *a* = 0 *otherwise.*

*Proof* To show (16), we fix *<sup>L</sup>* <sup>∈</sup> *(*0*,* <sup>1</sup>] and for **<sup>z</sup>** <sup>∈</sup> <sup>R</sup>3, we let *ζL(***z***)* be a smooth cut-off function satisfying

$$\zeta\_L(\mathbf{z}) = \begin{cases} 1 & : |\mathbf{z}| < L, \\ 0 & : |\mathbf{z}| > 2L. \end{cases}$$

and |*∂ζL(***z***)*| - *<sup>L</sup>*−<sup>1</sup> where *<sup>∂</sup>* := ∇<sup>⊥</sup> <sup>0</sup> or ∇<sup>0</sup> as well as |∇0∇<sup>⊥</sup> <sup>0</sup> *ζL(***z***)*| - *L*−2. This latter requirement ensures that the point of inflection of the graph of the cut-off, the portion that is constant, concave upwards and concave downwards are all captured. We now define the following

$$\begin{aligned} B\_1 &:= \left\{ (\mathbf{y}, r) \in \mathbb{R}^3 \, : \, | (\mathbf{x}, 0) - (\mathbf{y}, r) | = | (\mathbf{x} - \mathbf{y}, -r) | < 2L \right\}, \\ B\_2 &:= \left\{ (\mathbf{y}, r) \in \mathbb{R}^3 \, : \, L \le | (\mathbf{x} - \mathbf{y}, -r) | \le 1 \right\}, \\ B\_3 &:= \left\{ (\mathbf{y}, r) \in \mathbb{R}^3 \, : \, | (\mathbf{x} - \mathbf{y}, -r) | > 1 \right\}, \end{aligned}$$

so that by adding and subtracting *ζL*, we obtain

$$\begin{aligned} |\nabla \mathbf{u}(\mathbf{x})| = |\nabla\_0 \nabla\_0^\perp \psi((\mathbf{x}, 0))| \le |\nabla\_0 (\mathbf{u}\_1((\mathbf{x}, 0)), 0)| + |\nabla\_0 (\mathbf{u}\_2^\perp((\mathbf{x}, 0)), 0)| \\ + |\nabla\_0 (\mathbf{u}\_2^\mathcal{L}((\mathbf{x}, 0)), 0)| + |\nabla\_0 (\mathbf{u}\_2^\mathcal{J}((\mathbf{x}, 0)), 0)| \end{aligned}$$

$$\begin{aligned} &+|\nabla\_0(\mathbf{u}\_2^4((\mathbf{x},0)),0)|+|\nabla\_0(\mathbf{u}\_3((\mathbf{x},0)),0)| \\ &=: |\nabla\_0 \mathbf{u}\_1|+|\nabla\_0 \mathbf{u}\_2^1|+|\nabla\_0 \mathbf{u}\_2^2|+|\nabla\_0 \mathbf{u}\_2^3| \\ &+|\nabla\_0 \mathbf{u}\_2^4|+|\nabla\_0 \mathbf{u}\_3| \end{aligned}$$

where

∇0**u**<sup>1</sup> := 1 4*π* - *B*1 *ζL((***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r)) <sup>e</sup>*−|*(***x**−**y***,*−*r)*<sup>|</sup> |*(***x** − **y***,* −*r)*| ∇0∇<sup>⊥</sup> <sup>0</sup> *w((***y***,* 0*))* d**y**d*r,* <sup>∇</sup>0**u**<sup>1</sup> 2 := 1 4*π* - *B*2 <sup>1</sup> <sup>−</sup> *ζL((***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r))* ∇0∇<sup>⊥</sup> 0 *e*−|*(***x**−**y***,*−*r)*<sup>|</sup> |*(***x** − **y***,* −*r)*| *w((***y***,* 0*))* d**y**d*r,* <sup>∇</sup>0**u**<sup>2</sup> 2 := 1 4*π* - *B*2 ∇0∇<sup>⊥</sup> 0 <sup>1</sup> <sup>−</sup> *ζL((***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r)) <sup>e</sup>*−|*(***x**−**y***,*−*r)*<sup>|</sup> |*(***x** − **y***,* −*r)*| *w((***y***,* 0*))* d**y**d*r,* <sup>∇</sup>0**u**<sup>3</sup> 2 := 1 4*π* - *B*2 ∇<sup>⊥</sup> 0 <sup>1</sup> <sup>−</sup> *ζL((***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r))* ∇0 *e*−|*(***x**−**y***,*−*r)*<sup>|</sup> |*(***x** − **y***,* −*r)*| *w((***y***,* 0*))* d**y**d*r,* <sup>∇</sup>0**u**<sup>4</sup> 2 := 1 4*π* - *B*2 ∇0 <sup>1</sup> <sup>−</sup> *ζL((***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r))* ∇<sup>⊥</sup> 0 *e*−|*(***x**−**y***,*−*r)*<sup>|</sup> |*(***x** − **y***,* −*r)*| *w((***y***,* 0*))* d**y**d*r,* ∇0**u**<sup>3</sup> := 1 4*π* - *B*3 ∇0∇<sup>⊥</sup> 0 <sup>1</sup> <sup>−</sup> *ζL((***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r)) <sup>e</sup>*−|*(***x**−**y***,*−*r)*<sup>|</sup> |*(***x** − **y***,* −*r)*| *w((***y***,* 0*))* d**y**d*r.*

For *L* ∈ *(*0*,* 1], we have that

$$\begin{split} |\nabla\_{0}\mathbf{u}\_{1}| &\lesssim \left( \int\_{B\_{1}} \frac{e^{-2|\mathbf{(x,0)} - (\mathbf{y},r)|}}{|\mathbf{(x,0)} - (\mathbf{y},r)|^{2}} \, d\mathbf{y} dr \right)^{\frac{1}{2}} \|\nabla\_{0}\nabla\_{0}^{\perp}w(\mathbf{(y,0)})\|\_{2} \\ &\lesssim \left( \int\_{0}^{2L} \frac{e^{-2s}}{s^{2}} \, s^{2} \mathrm{d}s \right)^{\frac{1}{2}} \|w\|\_{2,2} \lesssim \left(1 - e^{-4L}\right)^{\frac{1}{2}} \|w\|\_{2,2} \lesssim \pi L^{\frac{1}{2}} \|w\|\_{2,2} . \end{split}$$

Now note that

$$\begin{split} \nabla\_{0} \mathbf{u}\_{2}^{\perp} :=& \frac{1}{4\pi} \int\_{B\_{2}} \left[ 1 - \xi\_{L}((\mathbf{x} - \mathbf{y}, -r)) \right] \Bigg[ \frac{2(\mathbf{x} - \mathbf{y})^{T}(\mathbf{x} - \mathbf{y})^{\perp}}{(|\mathbf{x} - \mathbf{y}|^{2} + r^{2})^{2}} + \frac{3(\mathbf{x} - \mathbf{y})^{T}(\mathbf{x} - \mathbf{y})^{\perp}}{(|\mathbf{x} - \mathbf{y}|^{2} + r^{2})^{\frac{3}{2}}} \\ & - \frac{1}{|\mathbf{x} - \mathbf{y}|^{2} + r^{2}} \Bigg( \begin{matrix} 0 & 1 & 0 \\ -1 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} \Bigg) - \frac{1}{(|\mathbf{x} - \mathbf{y}|^{2} + r^{2})^{\frac{3}{2}}} \Bigg( \begin{matrix} 0 & 1 & 0 \\ -1 & 0 & 0 \\ 0 & 0 & 0 \end{matrix} \Bigg{) \\ & + \frac{(\mathbf{x} - \mathbf{y})^{T}(\mathbf{x} - \mathbf{y})^{\perp}}{(|\mathbf{x} - \mathbf{y}|^{2} + r^{2})^{\frac{3}{2}}} + \frac{(\mathbf{x} - \mathbf{y})^{T}(\mathbf{x} - \mathbf{y})^{\perp}}{(|\mathbf{x} - \mathbf{y}|^{2} + r^{2})^{2}} \Bigg{]} e^{-|(\mathbf{x} - \mathbf{y}, -r)|} w(\mathbf{y}) \,\mathrm{d}r \Bigg{) \\ & =:& \sum\_{l=1}^{6} \mathbb{K}\_{l}((\mathbf{x} - \mathbf{y}, -r)). \end{split}$$

Clearly, |**x** − **y**| <sup>2</sup> ≤ |**<sup>x</sup>** <sup>−</sup> **<sup>y</sup>**<sup>|</sup> <sup>2</sup> <sup>+</sup> *<sup>r</sup>*<sup>2</sup> = |*(***<sup>x</sup>** <sup>−</sup> **<sup>y</sup>***,* <sup>−</sup>*r)*<sup>|</sup> <sup>2</sup> and for any *<sup>L</sup>* <sup>∈</sup> *(*0*,* <sup>1</sup>], the inequalities

$$(e^{-L} - e^{-1}) \le (1 - e^{-1}) \le (1 - e^{-1})(1 - \ln(L)) \stackrel{<}{\sim} (1 - \ln(L))^{\frac{1}{2}}$$

holds independent of *L*. Therefore, for *L* ∈ *(*0*,* 1], it follows that

$$\begin{split} |\mathbb{K}\_{1}((\mathbf{x}-\mathbf{y},-r))| + |\mathbb{K}\_{3}((\mathbf{x}-\mathbf{y},-r)) + |\mathbb{K}\_{6}((\mathbf{x}-\mathbf{y},-r))| \\ \leq \|w\|\_{\infty} \int\_{B\_{2}} \frac{e^{-|(\mathbf{x}-\mathbf{y},-r)|}}{|(\mathbf{x}-\mathbf{y},-r)|^{2}} \,\mathrm{d}r \,\mathrm{d}y \\ \leq \|w\|\_{\infty} \int\_{L}^{1} \frac{e^{-s}}{s^{2}} s^{2} \,\mathrm{d}s \\ \leq \|w\|\_{\infty} (1 - \ln(L)). \end{split}$$

Again, we can use |**x** − **y**| <sup>2</sup> ≤ |**<sup>x</sup>** <sup>−</sup> **<sup>y</sup>**<sup>|</sup> <sup>2</sup> <sup>+</sup> *<sup>r</sup>*<sup>2</sup> and the fact that the inequalities

$$(e^{-L}(L+1) - 2e^{-1}) \le (1 - 2e^{-1}) \le (1 - 2e^{-1})(1 - \ln(L)) \stackrel{<}{\sim} (1 - \ln(L))^2$$

holds independent of any *L* ∈ *(*0*,* 1] to obtain

$$\begin{aligned} \left| \mathbb{K}\_5((\mathbf{x} - \mathbf{y}, -r)) \right| &\lesssim \|w\|\_{\infty} \int\_{B\_2} \frac{e^{-| (\mathbf{x} - \mathbf{y}, -r) |}}{|(\mathbf{x} - \mathbf{y}, -r)|} \, d\mathbf{r} \, \mathbf{y} \\ &\lesssim \|w\|\_{\infty} \int\_L^1 \frac{e^{-s}}{s} s^2 \, \mathbf{ds} \\ &\lesssim \|w\|\_{\infty} (1 - \ln(L)). \end{aligned}$$

Finally, for K<sup>2</sup> and K4, we also obtain

$$\begin{split} \|\mathbb{K}\_2((\mathbf{x}-\mathbf{y},-r))\| + \|\mathbb{K}\_4((\mathbf{x}-\mathbf{y},-r)) \lesssim \|w\|\_{\infty} \int\_{B\_2} \frac{e^{-|(\mathbf{x}-\mathbf{y},-r)|}}{|(\mathbf{x}-\mathbf{y},-r)|^3} \,\mathrm{d}r \,\mathrm{d}\mathbf{y} \\ \lesssim \|w\|\_{\infty} \int\_L^1 \frac{e^{-s}}{s^3} s^2 \,\mathrm{d}s \\ \lesssim \|w\|\_{\infty} e^{-L} \int\_L^1 \frac{1}{s} \,\mathrm{d}s \\ \lesssim \|w\|\_{\infty} e^{-0} \left( -\ln(L) \right). \end{split}$$

We have shown that

$$|\nabla\_0 \mathbf{u}\_2^\mathsf{I}| \lesssim \|w\|\_{\infty} (1 - \ln(L))\tag{17}$$

for *<sup>L</sup>* <sup>∈</sup> *(*0*,* <sup>1</sup>]. Also, the quantity *(*1*/L*2*)*[*e*−*L(L* <sup>+</sup> <sup>1</sup>*)* <sup>−</sup> *<sup>e</sup>*−<sup>2</sup>*L(*2*<sup>L</sup>* <sup>+</sup> <sup>1</sup>*)*] is uniformly bounded for any *L* ∈ *(*0*,* 1] and as such,

$$\|\nabla\_0 \mathbf{u}\_2^2\| \lesssim \left(\int\_L^{2L} \frac{e^{-s}}{sL^2} s^2 \mathrm{d}s\right) \|w\|\_{\infty} \lesssim \|w\|\_{\infty}.\tag{18}$$

Next, we note that the estimate for <sup>∇</sup>0**u**<sup>3</sup> <sup>2</sup> and <sup>∇</sup>0**u**<sup>4</sup> <sup>2</sup> will be the same where in particular,

$$\begin{split} \nabla\_{0} \mathbf{u}\_{2}^{\frac{3}{2}} &:= \frac{-1}{4\pi} \int\_{B\_{2}} \nabla\_{0}^{\perp} \left[ 1 - \boldsymbol{\xi}\_{L}((\mathbf{x} - \mathbf{y}, -r)) \right] \bigg\{ \frac{(\mathbf{x} - \mathbf{y})^{T}}{|\mathbf{x} - \mathbf{y}|^{2} + r^{2}} \\ &+ \frac{(\mathbf{x} - \mathbf{y})^{T}}{(|\mathbf{x} - \mathbf{y}|^{2} + r^{2})^{\frac{3}{2}}} \bigg\} e^{-|(\mathbf{x} - \mathbf{y}, -r)|} w(\mathbf{y}) \,\mathrm{d}r \,\mathrm{d}\mathbf{y} \\ &=: \mathbb{K}\_{7}((\mathbf{x} - \mathbf{y}, -r)) + \mathbb{K}\_{8}((\mathbf{x} - \mathbf{y}, -r)). \end{split}$$

Since |**x** − **y**| ≤ 1 holds on *B*2, it follows from the condition |∇<sup>⊥</sup> <sup>0</sup> *ζL(***z***)*| -*L*−<sup>1</sup> that

$$|\mathbb{K}\_7((\mathbf{x} - \mathbf{y}, -r))| \lesssim \left( \int\_L^{2L} \frac{e^{-s}}{s^2 L} s^2 \, \mathrm{d}s \right) \|w\|\_{\infty} \lesssim \|w\|\_{\infty}$$

since *(*1*/L)*[*e*−*<sup>L</sup>* <sup>−</sup> *<sup>e</sup>*−2*L*] is uniformly bounded in *<sup>L</sup>*. Similarly, we can use the fact that <sup>|</sup>**<sup>x</sup>** <sup>−</sup> **<sup>y</sup>**| ≤ |**x** − **y**| <sup>2</sup> + *r*<sup>2</sup> to obtain

$$|\mathbb{K}\_8((\mathbf{x}-\mathbf{y},-r))| \lesssim \left(\int\_L^{2L} \frac{e^{-s}}{s^2L} s^2 \, \mathrm{d}s\right) \|w\|\_{\infty} \lesssim \|w\|\_{\infty}.$$

We can therefore conclude that,

$$|\nabla\_0 \mathbf{u}\_2^3| + |\nabla\_0 \mathbf{u}\_2^4| \lesssim \|w\|\_{\infty}.\tag{19}$$

Similar to the estimate for <sup>∇</sup>**u**<sup>1</sup> <sup>2</sup>, we have that

$$\|\nabla\_0 \mathbf{u}\_3\| \lesssim \|w\|\_{\infty}.\tag{20}$$

It follows by summing up the various estimates above that

$$\|\nabla \mathbf{u}\|\_{\infty} \lesssim L^{\frac{1}{2}} \|w\|\_{2,2} + (1 - \ln(L)) \|w\|\_{\infty}.\tag{21}$$

It remains to show that the estimate (21) also holds for **u**. For this, we first recall that

$$\mathbf{u}(\mathbf{x}) = \frac{1}{4\pi} \int\_{\mathbb{R}^3} \nabla\_0^\perp \left[ \frac{e^{-|(\mathbf{x} - \mathbf{y}, -r)|}}{|(\mathbf{x} - \mathbf{y}, -r)|} \right] w((\mathbf{y}, 0)) \,\mathrm{dyd}r. \tag{22}$$

We now use the inequalities

$$|\mathbf{x} - \mathbf{y}| \le \left( |\mathbf{x} - \mathbf{y}|^2 + r^2 \right)^{\frac{1}{2}} \tag{23}$$

and

$$\frac{1}{4\pi} \left| \nabla\_0^\perp \frac{e^{-|(\mathbf{x}-\mathbf{y},-r)|}}{|(\mathbf{x}-\mathbf{y},-r)|} \right| \lesssim \left[ \frac{|\mathbf{x}-\mathbf{y}|}{(|\mathbf{x}-\mathbf{y}|^2+r^2)^{\frac{3}{2}}} + \frac{|\mathbf{x}-\mathbf{y}|}{|\mathbf{x}-\mathbf{y}|^2+r^2} \right] e^{-|(\mathbf{x}-\mathbf{y},-r)|}$$

to obtain

$$\begin{split} \|\mathbf{u}\|\_{\infty} &\lesssim \|w\|\_{\infty} \int\_{\mathbb{R}^3} \left[ \frac{1}{|\mathbf{x}-\mathbf{y}|^2 + r^2} + \frac{1}{(|\mathbf{x}-\mathbf{y}|^2 + r^2)^{\frac{1}{2}}} \right] e^{-|(\mathbf{x}-\mathbf{y}, -r)|} \mathbf{d}\mathbf{y} dr \\ &\lesssim \|w\|\_{\infty} \int\_0^\infty \frac{e^{-s}}{s^2} s^2 \mathbf{d}s + \|w\|\_{\infty} \int\_0^\infty \frac{e^{-s}}{s} s^2 \mathbf{d}s \\ &\lesssim \|w\|\_{\infty} . \end{split} \tag{24}$$

Therefore, it follows from (21) and (24) that

$$\|\|\mathbf{u}\|\|\_{1,\infty} \lesssim L^{\frac{1}{2}} \|w\|\_{2,2} + (1 - \ln(L)) \|w\|\_{\infty}.\tag{25}$$

If *w*2*,*<sup>2</sup> <sup>≤</sup> 1, we choose *<sup>L</sup>* <sup>=</sup> 1 and if *w*2*,*<sup>2</sup> *<sup>&</sup>gt;* 1, we take *<sup>L</sup>* = *w*−<sup>2</sup> <sup>2</sup>*,*<sup>2</sup> so that (16) holds. This finishes the proof.

Before we end the subsection, we also note that a direct computation using the definition of Sobolev norms in frequency space (7) immediately yield

$$\|\mathbf{u}\|\_{k+1,2} \lesssim \|w\|\_{k,2} \tag{26}$$

for any *<sup>k</sup>* <sup>∈</sup> <sup>N</sup> ∪ {0} where *<sup>w</sup>* <sup>∈</sup> *<sup>W</sup>k,*2*(*R2*)* is a given function in (15).

#### *2.3 A Priori Estimate*

In order to prove Theorem 2, we first need some preliminary estimates for *(b, q)*. In the following, we define

$$\|(b,q)\| := \|b\|\_{3,2} + \|q\|\_{2,2}.$$

**Lemma 1** *A strong solution of* (1)*–*(4) *satisfies the bound*

$$\frac{\mathbf{d}}{\mathbf{d}t} \|(b,q)\|^2 \lesssim \left(1 + \|\mathbf{u}\|\_{1,\infty} + \|\nabla b\|\_{\infty} + \|q\|\_{\infty}\right) \left(1 + \|(b,q)\|^2\right). \tag{27}$$

*Proof* Since the space of smooth functions is dense in the space *<sup>W</sup>*3*,*2*(*R2*)* <sup>×</sup> *W*2*,*2*(*R2*)* of existence, in the following, we work with a smooth solution pair *(b, q)*. To achieve our desired estimate, we apply *<sup>∂</sup><sup>β</sup>* to (1) for <sup>|</sup>*β*| ≤ 3 to obtain

$$
\partial\_l \partial^\beta b + \mathbf{u} \cdot \nabla \partial^\beta b = R\_l \tag{28}
$$

where

$$\mathcal{R}\_{\mathbb{I}} := \mathbf{u} \cdot \partial^{\beta} \nabla b - \partial^{\beta} (\mathbf{u} \cdot \nabla b) .$$

Now since div**<sup>u</sup>** <sup>=</sup> 0, if we multiply (28) by *<sup>∂</sup>βb* and integrate over space, the second term on the left-hand side of (28) vanishes after integration by parts. On the other hand, we can use the commutator estimate (see for instant [4, Sect. 2.2]) to estimate the residual term *R*1. Consequently, by multiplying (28) by *∂βb*, integrating over space, and summing over the multiindices *β* so that |*β*| ≤ 3, we obtain

$$\begin{split} \frac{\mathbf{d}}{\mathbf{d}t} \|b\|\_{3,2}^2 &\lesssim \left( \|\nabla \mathbf{u}\|\_{\infty} \|b\|\_{3,2} + \|\nabla b\|\_{\infty} \|\mathbf{u}\|\_{3,2} \right) \|b\|\_{3,2} \\ &\lesssim \left( \|\nabla \mathbf{u}\|\_{\infty} + \|\nabla b\|\_{\infty} \right) (1 + \|(b, q)\|^2) \end{split} \tag{29}$$

where we have used (26) for *w* = *q* − *f* and *k* = 2. Next, we find a bound for *q*<sup>2</sup> <sup>2</sup>*,*2. For this, we apply *<sup>∂</sup><sup>β</sup>* to (2) for <sup>|</sup>*β*| ≤ 2 and we obtain

$$
\partial\_l \partial^\beta q + \mathbf{u} \cdot \nabla \partial^\beta (q - b) + \mathbf{u}\_h \cdot \nabla \partial^\beta b = R\_2 + R\_3 + R\_4 \tag{30}
$$

where

$$R\_2 := \mathbf{u} \cdot \partial^{\beta} \nabla q - \partial^{\beta} (\mathbf{u} \cdot \nabla q),$$

$$R\_3 := -\mathbf{u} \cdot \partial^{\beta} \nabla b + \partial^{\beta} (\mathbf{u} \cdot \nabla b),$$

$$R\_4 := \mathbf{u}\_h \cdot \partial^{\beta} \nabla b - \partial^{\beta} (\mathbf{u}\_h \cdot \nabla b).$$

Now notice that for <sup>U</sup> := ∇**u**, it follows from interpolation that

$$\|\nabla \mathbb{U}\|\_{4} \lesssim \|\mathbb{U}\|\_{\infty}^{\frac{1}{2}} \|\nabla^{2} \mathbb{U}\|\_{2}^{\frac{1}{2}}$$

and so,

$$\|\nabla^2 \mathbf{u}\|\_4 \lesssim \|\nabla \mathbf{u}\|\_{\infty}^{\frac{1}{2}} \|\mathbf{u}\|\_{3,2}^{\frac{1}{2}}.$$

Similarly

$$\|\nabla q\|\_{4} \lesssim \|q\|\_{\infty}^{\frac{1}{2}} \|q\|\_{2,2}^{\frac{1}{2}}.$$

Therefore,

$$\|\nabla q\|\_4 \|\nabla^2 \mathbf{u}\|\_4 \lesssim \|q\|\_\infty \|\mathbf{u}\|\_{3,2} + \|\nabla \mathbf{u}\|\_\infty \|q\|\_{2,2}.$$

By using this estimate, we deduce from (26) and commutator estimates that

$$\|\|R\_2\|\|\_2 \lesssim \|\nabla \mathbf{u}\|\_{\infty} \|q\|\_{2,2} + \|q\|\_{\infty} (1 + \|q\|\_{2,2}).\tag{31}$$

The commutators *R*<sup>3</sup> and *R*<sup>4</sup> are easy to estimate and are given by

$$\|\|R\_3\|\|\_2 \lesssim \|\nabla \mathbf{u}\|\_{\infty} \|b\|\_{3,2} + \|\nabla b\|\_{\infty} (1 + \|q\|\_{2,2}),\tag{32}$$

$$\|\|R\_4\|\_2 \lesssim \|b\|\_{3,2} + \|\nabla b\|\_{\infty},\tag{33}$$

respectively, for a given **<sup>u</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*3*,*2*(*R2; <sup>R</sup>2*)*. Next, by using div**<sup>u</sup>** <sup>=</sup> 0, we obtain

$$\left< \left( \mathbf{u} \cdot \nabla \partial^{\beta} q \right), \ \partial^{\beta} q \right> = 0. \tag{34}$$

Additionally, the following estimates holds true

$$\left| \left( \left( \mathbf{u} \cdot \nabla \partial^{\beta} b \right) , \ \partial^{\beta} q \right) \right| \lesssim \| \mathbf{u} \| \_{\infty} \| b \|\_{3,2}^{2} + \| \mathbf{u} \| \_{\infty} \| q \|\_{2,2}^{2},\tag{35}$$

$$\left| \left( \left( \mathbf{u}\_h \cdot \nabla \partial^{\beta} b \right) , \ \partial^{\beta} q \right) \right| \lesssim \|b\|\_{3,2}^2 + \|q\|\_{2,2}^2 \tag{36}$$

since **<sup>u</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*3*,*2*(*R2; <sup>R</sup>2*)*. If we now collect the estimates above (keeping in mind that *<sup>f</sup>* <sup>∈</sup> *<sup>W</sup>*2*,*2*(*R2*)* and **<sup>u</sup>***<sup>h</sup>* <sup>∈</sup> *<sup>W</sup>*3*,*2*(*R2; <sup>R</sup>2*)*), we obtain by multiplying (2) by *<sup>∂</sup>βq* and then summing over |*β*| ≤ 2, the following

$$\frac{\mathbf{d}}{\mathbf{d}t} \|q\|\_{2,2}^2 \lesssim \left(1 + \|\mathbf{u}\|\_{1,\infty} + \|\nabla b\|\_{\infty} + \|q\|\_{\infty}\right) \left(1 + \|(b,q)\|^2\right). \tag{37}$$

Summing up (29) and (37) yields the desired result.

We now have all in hand to prove our main theorem, Theorem 2. *Proof of Theorem 2* In the following, we define the time-dependent function *g* as

$$\mathbf{g}(t) := \mathbf{e} + \|(b, q)(t)\|, \quad \text{for} \quad t \in [0, T]. \tag{38}$$

Next, without loss of generality, we assume that *f* = 0 so that from Proposition 1, we obtain

$$\|\mathbf{u}(t)\|\_{1,\infty} \stackrel{<}{\sim} 1 + (1 + \ln \|q(t)\|\_{2,2}) (\|\nabla b(t)\|\_{\infty} + \|q(t)\|\_{\infty}) \tag{39}$$

for *t* ∈ [0*, T* ]. Using the monotonic properties of logarithms, it follows from the above that

$$\|\mathbf{u}(t)\|\_{1,\infty} \lesssim 1 + \ln[g(t)](\|\nabla b(t)\|\_{\infty} + \|q(t)\|\_{\infty}).\tag{40}$$

Furthermore, since <sup>1</sup> <sup>≤</sup> ln*(*e+|*x*|*)* for any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>, we can deduce from the inequality above that

$$\|\|\mathbf{u}(t)\|\|\_{1,\infty} + \|\nabla b(t)\|\|\_{\infty} + \|q(t)\|\_{\infty} \stackrel{\scriptstyle \mathcal{L}}{\lesssim} 1 + \ln[g(t)](\|\nabla b(t)\|\_{\infty} + \|q(t)\|\_{\infty}).\tag{41}$$

On the other hand, it follows from Lemma 1 that

$$\log(t) \le g(0)\exp\left(c\int\_0^t \left(1 + \|\mathbf{u}(s)\|\_{1,\infty} + \|\nabla b(s)\|\_{\infty} + \|q(s)\|\_{\infty}\right)ds\right) \tag{42}$$

for any *t* ∈ [0*, T* ]. Combining (41) and (42) yields

$$\log(t) \le g(0)\exp\left(c\int\_0^t \left(1 + \ln[g(s)](\|\nabla b(s)\|\_{\infty} + \|q(s)\|\_{\infty})\right)ds\right). \tag{43}$$

We can now take logarithm of both sides and apply Grönwall's lemma to the resulting inequality to obtain

$$\ln[g(t)] \le \left(\ln[g(0)] + cT\right) \exp\left(c \int\_0^t \left(\|\nabla b(s)\|\_{\infty} + \|q(s)\|\_{\infty}\right) ds\right). \tag{44}$$

At this, point, we can now utilize (8), take exponentials in (44) and obtain

$$\|(b,q)(t)\| \le \|\mathbf{g}(0)\|^{\exp(cK)} \exp[cT \exp(cK)]\tag{45}$$

for any *t* ∈ [0*, T* ]. Since the right-hand side is finite, it follows that the solution *(b, q)* can be continued on some interval [0*, T )* for some *T > T* . This finishes the proof.

**Acknowledgments** This work has been supported by the European Research Council (ERC) Synergy grant STUOD-DLV-856408.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Modeling Under Location Uncertainty: A Convergent Large-Scale Representation of the Navier-Stokes Equations**

#### **Arnaud Debussche, Berenger Hug, and Etienne Mémin**

**Abstract** We construct martingale solutions for the stochastic Navier-Stokes equations in the framework of the modelling under location uncertainty (LU). These solutions are pathwise and unique when the spatial dimension is 2D. We then prove that if the noise intensity goes to zero, these solutions converge, up to a subsequence in dimension 3, to a solution of the deterministic Navier-Stokes equation. This warrants that the LU Navier-Stokes equations can be interpreted as a large-scale model of the deterministic Navier-Stokes equation.

#### **1 Introduction**

For several years there has been a burst of activity to devise stochastic representations of fluid flow dynamics. These models are strongly motivated in particular by climate and weather forecasting issues and the need to provide accurate ensemble of large-scale flow realisations [2]. Yet, elaborating such stochastic dynamics on *ad hoc* grounds can be highly detrimental to the system of interest [4]. A minimal mathematical requirement for satisfactory large-scale flow dynamics representation is that a weak solution of the Large Eddy Simulation (LES) scheme converges toward a weak solution of the fine-scale deterministic Navier-Stokes equations in 3D and toward the unique solution for the 2D Navier-Stokes equations. The convergence of some classical LES models toward the true fine scale dynamics is well known in the deterministic case [3, 7]. However, the question of convergence of stochastic parametrization toward solutions of the deterministic equations at the limit of vanishing noise is not always clear.

A. Debussche

B. Hug · E. Mémin (-) Inria/IRMAR Campus de Beaulieu, Rennes Cedex, France e-mail: berenger.hug@ens-rennes.fr; etienne.memin@inria.fr

Univ Rennes, CNRS, IRMAR - UMR 6625, Rennes Cedex, France e-mail: arnaud.debussche@ens-rennes.fr

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_2

In this study we show that stochastic Navier-Stokes models defined within the modelling under location uncertainty principle (LU) [9] have martingale solutions in 3D and a unique strong solution—in the probabilistic sense—in 2D. Moreover, in 3D in the limit of vanishing noise there exists a subsequence converging in law toward a weak solution of the deterministic Navier-Stokes equations and in 2D the whole sequence converges toward the unique solution. As such these results enable to consider the LU representation as a valid large-scale stochastic representation of flow dynamics that is more amenable to ensemble forecasting and data assimilation than deterministic model due to an improved variability.

#### **2 Modelling Under Location Uncertainty**

The LU formulation relies mainly on the following time-scale separation assumption of the flow:

$$\operatorname{d}X\_{l} = \mu(X\_{l}, t)\operatorname{d}t + \sigma(X\_{l}, t)\operatorname{d}W\_{l},\tag{1}$$

where *<sup>X</sup>* : <sup>R</sup><sup>+</sup> <sup>×</sup> *<sup>Ω</sup>* <sup>→</sup> <sup>S</sup> is the Lagrangian displacement defined within the bounded domain <sup>S</sup> <sup>⊂</sup> <sup>R</sup>*<sup>d</sup> (d* <sup>=</sup> 2 or 3*)* with smooth boundary, and *<sup>u</sup>* : <sup>R</sup><sup>+</sup> <sup>×</sup> S × *Ω* → S denotes the large-scale velocity that is both spatially and temporally correlated, while *σ*d*W* is a highly oscillating unresolved component (also called noise term) that is only correlated in space.

More precisely, we consider a cylindrical Wiener process *<sup>W</sup>* on *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )*, the space of square integrable functions on <sup>S</sup> with values in <sup>R</sup>*<sup>d</sup>* ,

$$\mathbf{w} = \sum\_{i \in \mathbb{N}} \hat{\boldsymbol{\beta}}^{\boldsymbol{\ell}} \boldsymbol{e}\_{\boldsymbol{\ell}},$$

where *(ei)i*∈<sup>N</sup> is a Hilbertian orthonormal basis of *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* and *(β*<sup>ˆ</sup> *i)i*∈<sup>N</sup> is a sequence of independent standard brownian motions on a stochastic basis *(Ω,* <sup>F</sup>*, (*F*t)t*∈[0*,T* ]*,* <sup>P</sup>*)* ([11]). The above does not converge in *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* but in any larger Hilbert space *<sup>U</sup>* such that the embedding of *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* into *<sup>U</sup>* is Hilbert-Schmidt, for instance *<sup>U</sup>* can be the *<sup>L</sup>*2*(*S*)* based Sobolev space *<sup>H</sup>* <sup>−</sup>*α(*S*)* for some *α > d/*2.

The spatial structure of the noise is specified through a time dependent deterministic integral covariance operator *σt* defined from a bounded and symmetric kernel *<sup>σ</sup>*:

$$\sigma\_l f(\mathbf{x}) \ := \int\_{\mathcal{S}} \widehat{\sigma}(\mathbf{x}, \mathbf{y}, t) \ f(\mathbf{y}) \ \mathbf{dy}, \ f \in L^2(\mathcal{S}, \mathbb{R}^d).$$

For each *(x, y, t)*, *σ (x, y, t)* is a *<sup>d</sup>* <sup>×</sup> *<sup>d</sup>* symmetric tensor. Since *<sup>σ</sup>*<sup>ˆ</sup> is bounded in *<sup>x</sup>*; *<sup>y</sup>* and *<sup>t</sup>*, *σ (x, t)* maps *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* into itself and is Hilbert-Schmidt. Then, the noise can be written as the Wiener process:

$$
\sigma\_l \, W\_l = \sum\_{i \in \mathbb{N}} \hat{\beta}\_l^i \, \sigma\_l \, e\_i,
$$

where the series converges in *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* almost surely and in *<sup>L</sup>p(Ω)* for all *<sup>p</sup>* <sup>∈</sup> <sup>N</sup> and Eq. (1) should be understood in the Itô sense. We may further write the dependance of the Wiener process in terms of the other variables:

$$
\sigma\_l W\_l(\mathfrak{x}, \omega) = \sum\_{l \in \mathbb{N}} \hat{\beta}\_l^l(\omega) \sigma\_l e\_l(\mathfrak{x}),
$$

We consider a divergence free noise:

$$\nabla\_{\mathbf{x}} \cdot \hat{\sigma}(\mathbf{x}, \mathbf{y}, t) = 0, \ x, \mathbf{y} \in \mathcal{S}, \ t \ge 0.$$

Also, for each *<sup>t</sup>* <sup>∈</sup> <sup>R</sup>+, there exists *(φn(t))n* a complete orthogonal system composed by eigenfunctions of the covariance operator at each time *<sup>t</sup>* <sup>∈</sup> <sup>R</sup> and another sequence of independent standard brownian motions, on the same stochastic basis *(Ω,* <sup>F</sup>*, (*F*t)t*∈[0*,T* ]*,* <sup>P</sup>*)*, such that we have the representation:

$$
\sigma\_{\mathbf{l}} \, \_{\mathbf{W}} = \sum\_{k=0}^{\infty} \phi\_k(\mathbf{t}) \, \beta\_{\mathbf{l}}^k \, .
$$

This Gaussian random field is associated to the two-times, two-points covariance tensor given by

$$\mathcal{Q}(\mathbf{x}, \mathbf{y}, t, t') = \mathbb{E}\left(\sigma\_l \mathbf{d} W\_l(\mathbf{x}) \left[\sigma\_{l'} \mathbf{d} W\_{l'}\right]^T(\mathbf{y})\right) = \int\_{\mathcal{S}} \widehat{\sigma}(\mathbf{x}, z, t) \widehat{\sigma}(\mathbf{y}, z, t') \mathrm{d}\mathbf{y} \,\delta(t - t')\right),$$

with the diagonal part (i.e one time auto-correlation), referred to in the following as the variance tensor, and denoted by

$$a(\mathbf{x},t) = \int\_{\mathcal{S}} \widehat{\sigma}(\mathbf{x},\mathbf{y},t)\,\widehat{\sigma}(\mathbf{x},\mathbf{y},t)d\mathbf{y} \,=\sum\_{k=0}^{\infty} \phi\_{k}(\mathbf{x},t)\,\phi\_{k}^{T}(\mathbf{x},t). \tag{2}$$

In a way similar to the classical derivation of Navier-Stokes equations, the LU setting is based on a stochastic representation of the Reynolds transport theorem (SRTT) [9], describing the rate of change of a random scalar *q* within a volume *V (t)* transported by the stochastic flow (1). For incompressible unresolved flows, (i.e. ∇ **·** *σ* = 0), the SRTT reads

$$\mathrm{d}\left(\int\_{V(t)} q(\mathbf{x}, t) \,\mathrm{d}\mathbf{x}\right) = \int\_{V(t)} \left(\mathbb{D}\_l q + q \nabla \cdot (u - u\_s) \mathrm{d}t\right) \mathrm{d}\mathbf{x},\tag{3a}$$

$$\mathbb{D}\_l q = \mathbf{d}\_l q + (\mu - \mu\_s) \cdot \nabla q \, \mathrm{d}t + \sigma \, \mathrm{d}W\_l \cdot \nabla q - \frac{1}{2} \nabla \cdot (a \nabla q) \, \mathrm{d}t,\tag{3b}$$

where d*tq(x, t)* = *q(x, t* +d*t)*−*q(x, t)* stands for the forward time-increment of *q* at a fixed point *x*, D*<sup>t</sup>* is introduced as the stochastic transport operator in [9, 12] and plays the role of the material derivative. Recall that *u* is the large-scale velocity used in (1) and *a* is defined in (2). Note also that we omit to mention the dependance of *σ* on time.

This operator is derived from the Itô-Wentzell formula [8] to express the differentiation of a stochastic process transported by the flow [9]. The drift *us* <sup>=</sup> <sup>1</sup> <sup>2</sup>∇ **·***a*, coined as the Itô-Stokes drift (ISD) in [1], represents through the divergence of the variance tensor, the effects of the small-scale inhomogeneity on the largescale flow component. This term can be understood as a generalization of the Stokes drift associated to the waves orbital motion. In addition to this modified advection, the stochastic transport operator involves an inhomogeneous diffusion driven by the variance tensor, which can be interpreted as a subgrid diffusion term attached to the mixing operated by the small scales. It can be noticed that this term would only be implicitly represented in Stratonovich integral form. However, the ISD would remain [1]. The remaining term corresponds to the advection by the random term. It can be observed by a direct application of Itô on the norm of the scalar that the positive energy brought by this (backscattering) term is exactly compensated by the energy loss by the diffusion [12]. Due to that, for a transported quantity, its energy is conserved pathwise, or in other words: for any realization of the flow.

The above SRTT (3a) and Newton's second principle (in a distributional sense) allow us to derive the following stochastic equations of motions (see Sect. 5 of [9] or Sect. 2.2–2.3 of [10]), which for any noise scaling *ε >* 0 parameter and for all points of S reads, using *σ, us, a* introduced above:

$$\mathrm{d}\_{l}\mathrm{d} + (\boldsymbol{u} - \boldsymbol{\varepsilon}^{2}\boldsymbol{u}\_{s}) \cdot \nabla\boldsymbol{u}\,\mathrm{d}\boldsymbol{t} + \boldsymbol{\varepsilon}\boldsymbol{\sigma}\mathrm{d}W\_{l} \cdot \nabla\boldsymbol{u} - \frac{1}{2}\boldsymbol{\varepsilon}^{2}\nabla \cdot (\boldsymbol{a}\nabla\boldsymbol{u})\,\mathrm{d}\boldsymbol{t}$$

$$= -\frac{1}{\rho}\nabla(\boldsymbol{p}\,\mathrm{d}\boldsymbol{t} + \mathrm{d}\boldsymbol{p}\_{l}^{\sigma}) + \frac{1}{R\_{\varepsilon}}\Delta(\boldsymbol{u}\,\mathrm{d}\boldsymbol{t} + \boldsymbol{\varepsilon}\boldsymbol{\sigma}\,\mathrm{d}W\_{l}),\tag{4}$$

with the incompressibility conditions

$$
\nabla \cdot (\mu - \varepsilon^2 u\_s) = 0 \qquad , \qquad \nabla \cdot \sigma = 0 \,, \tag{5}
$$

and associated with Dirichlet boundary condition *u(t, x)* <sup>=</sup> 0 and *σ (x, y, t)* <sup>=</sup> <sup>0</sup> for all *x* ∈ *∂*S and *t >* 0. The initial condition is denoted by *u(*0*, x)* = *u*0*(x)* for all *x* ∈ S. As usual, *u(t, x)* = *(u*1*(t, x), . . . , ud (t, x))* and *p(t, x)* stands for the velocity and the pressure of the fluid, respectively. The term d*p<sup>σ</sup> <sup>t</sup>* corresponds to the Brownian (martingale) part of the pressure. The Ito-Stokes drift *us* is defined as *us* := 1 2 ∇ **·** *a* and *ρ* stands for the fluid density. The dimensioning constant *Re* = *UL/ν* denotes the Reynolds number, sets from the ratio of the product of characteristic length and velocity scales, *UL*, with the kinematics viscosity *ν*. As for the noise scaling parameter, , it encodes a scale of the unresolved energy and should converge to zero when all the flow components are resolved. Meaning thus there is no noise and the system corresponds trivially to the deterministic Navier-Stokes system.

Although the system corresponds to the Navier-Stokes for zero noise, the convergence toward weak (strong) solutions of the 3D (2D) deterministic Navier-Stokes, respectively, at the limit of vanishing noise needs to be assessed. This is the results we aim to prove in this paper.

First of all, in order to work with a pressure-free system through a divergencefree Leray projection, we proceed to the change of variable *<sup>v</sup>* := *<sup>u</sup>* <sup>−</sup> *<sup>ε</sup>*<sup>2</sup>*us* in (4) to rewrite the system with a classical incompressibility condition on *v*:

$$d\_l v + \boldsymbol{v} \cdot \nabla v \, \mathrm{d}t - \frac{1}{R\_\varepsilon} \Delta v \, \mathrm{d}t + \varepsilon^2 (\boldsymbol{v} \cdot \nabla) \boldsymbol{u}\_s \, \mathrm{d}t - \frac{\varepsilon^2}{2} \nabla \cdot (a \nabla v) \, \mathrm{d}t$$

$$-\frac{\varepsilon^4}{2} \, \nabla \cdot (a \nabla \boldsymbol{u}\_s) \, \mathrm{d}t - \frac{\varepsilon^2}{R\_\varepsilon} \Delta \boldsymbol{u}\_s \, \mathrm{d}t + \varepsilon^2 \partial\_l \boldsymbol{u}\_s \, \mathrm{d}t = -\frac{1}{\rho} \nabla (p \, \mathrm{d}t + \mathrm{d}p\_l^\sigma) -$$

$$(\varepsilon \sigma \mathrm{d} \boldsymbol{W}\_l \cdot \nabla) v - (\varepsilon^3 \sigma \mathrm{d} \boldsymbol{W}\_l \cdot \nabla) \boldsymbol{u}\_s + \frac{\varepsilon}{R\_\varepsilon} \Delta (\sigma \, \mathrm{d} \boldsymbol{W}\_l), \qquad (6)$$

with the incompressibility conditions

$$
\nabla \cdot v = 0 \qquad \qquad \nabla \cdot \sigma = 0 \; , \tag{7}
$$

for all points in S together with Dirichlet boundary conditions *v(t, x)* = 0, *σ (x, y, t)* <sup>=</sup> <sup>0</sup> for all *<sup>x</sup>* <sup>∈</sup> *<sup>∂</sup>*<sup>S</sup> and *t >* 0 and the initial condition *v(*0*, x)* <sup>=</sup> *<sup>v</sup>*0*(x)* := *<sup>u</sup>*0*(x)* <sup>−</sup> *<sup>ε</sup>*<sup>2</sup>*us(*0*, x)* for all *<sup>x</sup>* <sup>∈</sup> <sup>S</sup>. In the following section we specify the spaces on which this system is defined, rewrite it in an equivalent abstract form and state our main result.

#### **3 Notations and Main Result**

Let V be the space of infinitely differentiable *d*-dimensional vector fields *u* on S, with compact support strictly contained in S, and satisfying ∇ **·** *u* = 0. We denote by *<sup>H</sup>* the closure of <sup>V</sup> in *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* and *<sup>V</sup>* the closure of <sup>V</sup> in the Sobolev space *<sup>H</sup>*1*(*S*,* <sup>R</sup>*<sup>d</sup> )*. The space *<sup>H</sup>* is endowed with the *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* inner product. This inner product and its induced norm are noted:

$$(u,v)\_H := (u,v)\_{L^2(\mathcal{S})} \quad \text{and} \quad |u|\_H := \|u\|\_{L^2(\mathcal{S})} \dots$$

As for space *V* , thanks to Poincaré inequality, it is endowed with the *H*<sup>1</sup> <sup>0</sup> *(*S*,* <sup>R</sup>*<sup>d</sup> )* inner product and its associated norm, denoted respectively as

$$((\mu, \upsilon))\_V := (\nabla \mu, \nabla \upsilon)\_{L^2(\mathcal{S})} \quad \text{and} \quad \|\mu\|\_V := \|\nabla \mu\|\_{L^2(\mathcal{S})}.$$

We may define then the Gelfand triple *V* ⊂ *H* ⊂ *V* where *V* is the dual space of *V* relative to *H*. We denote by · *,* ·*<sup>V</sup>* <sup>×</sup>*<sup>V</sup>* the duality pairing between *V* and *V* . The space of Hilbert-Schmidt operators from *H* to *H* is denoted by L2*(H )* and ·L<sup>2</sup> is its norm.

System (4) may be rewritten in an equivalent simplified pressure-free formulation by using the Leray projection *<sup>P</sup>* : *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* <sup>→</sup> *<sup>H</sup>* of *<sup>L</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )* onto the space *<sup>H</sup>* of divergence-free vectorial functions. Applying Leray's projector to (6), we obtain

$$\begin{split} d\_l v - \frac{1}{R\_\varepsilon} P(\Delta v \text{d}t) &+ P(v \cdot \nabla v \text{d}t) \\ &+ P\left(\varepsilon^2 (v \cdot \nabla) u\_s \text{d}t - \frac{\varepsilon^2}{2} \nabla \cdot (a \nabla v) \text{d}t - \frac{\varepsilon^4}{2} \nabla \cdot (a \nabla u\_s) \text{d}t - \frac{\varepsilon^2}{R\_\varepsilon} \Delta u\_s \text{d}t + \varepsilon^2 \partial\_l u\_s \text{d}t\right) \\ &= P\left(\frac{\varepsilon}{R\_\varepsilon} \Delta (\sigma \text{d}W\_l) \text{ -- } (\varepsilon \sigma \text{d}W\_l \cdot \nabla) v \text{ -- } (\varepsilon^3 \sigma \text{d}W\_l \cdot \nabla) u\_s\right) . \end{split} \tag{8}$$

This system can finally be rewritten in the following simplified abstract form

$$\begin{cases} d\_l v(t) + A v(t) \, \text{d}t \, +B v(t) \, \text{d}t \, +F\_\varepsilon v(t) \, \text{d}t \, = \, G\_\varepsilon v(t) \, \text{d}W\_l, \\ v(0) = v\_0. \end{cases} \tag{9}$$

The deterministic terms *A*, *B*, *Fε* and the stochastic term *Gε* are described below.

Several kinds of solutions can be defined for stochastic partial differential equations. As for deterministic PDEs, these can be strong, weak or mild (semigroup) solutions. When the solutions are constructed for a fixed Wiener process *W* on a given stochastic basis *(Ω,* <sup>F</sup>*, (*F*t)t*∈[0*,T* ]*,* <sup>P</sup>*)*, they are strong in the probabilistic sense. As usual in 3D, due to the lack of uniqueness, we work with weaker solutions, called martingale solutions, that consists in looking for solutions defined as a triplet composed of a stochastic basis, a Wiener process and an adapted process.

More precisely, we say that there is a martingale solution of system (9) if there exists a stochastic basis *(Ω,* <sup>F</sup>*, (*F*t)t*∈[0*,T* ]*,* <sup>P</sup>*)*, a cylindrical Wiener process *<sup>W</sup>* on *<sup>L</sup>*2*(*S; <sup>R</sup>*<sup>d</sup> )* and a progressively measurable process *<sup>v</sup>* : [0*, T* ] × *<sup>Ω</sup>* <sup>→</sup> *<sup>H</sup>*, with

$$v \in L^2\left(\mathcal{Q} \times [0, T]; V\right) \cap L^2\left(\mathcal{Q}, \,\, C^0([0, T]; H)\right),$$

such that <sup>P</sup> <sup>−</sup> *a.e*, *<sup>v</sup>* satisfies for all time *<sup>t</sup>* ∈ [0*, T* ]

$$v(t) \, + \, \int\_0^l Av(s) \, \text{ds} \, + \, \int\_0^l Bv(s) \, \text{ds} \, + \, \int\_0^l Fv(s) \, \text{ds} \, = \,\, v\_0 \, + \, \int\_0^l G(v(s)) \, \text{dW}\_s,\tag{10}$$

where the equality must be understood in the weak sense. We will show, for all *ε >* 0, the existence in 3D of a martingale solution for the LU representation of the Navier-Stokes equations for noises associated with a smooth enough diffusion tensor kernel *<sup>σ</sup>* in space and time. In 2D, this solution is unique and strong in the probabilistic sense. This result is summarized in the following theorem.

**Theorem 1** *Let d* = 2 *or* 3 *and assume that the noise is smooth enough in the sense that its variance tensor and Ito-Stokes drift are such that*

$$\sup\_{t \in [0,T]} \sum\_{k=0}^{\infty} \left\| \phi\_k(t) \right\|\_{H^3(\mathcal{S})}^2 < \infty,\tag{11}$$

$$u\_s \in L^{\infty}(0, T; H^3(\mathcal{S}, \mathbb{R}^d)); \ \partial\_l u\_s \in L^{\infty}(0, T; H) \text{ and } a\nabla u\_s \in L^{\infty}(0, T; V). \tag{12}$$

*Then, for all ε >* 0*, Eq.* (10) *admits a martingale solution. Moreover, for d* = 2*, any solution of* (10) *is strong in the probabilistic sense and unique.*

*Morever, when ε* → 0*, for d* = 3*, there exists a subsequence of (uε)ε>*<sup>0</sup> *which converges in law to a solution of the deterministic Navier-Stokes equation. For d* = 2*, the whole sequence converges to the unique solution of the Navier-Stokes equation.*

The condition of Theorem 1 simplifies when the covariance operator does not depend on time or if the ISD is divergence free. In both cases the condition on the temporal derivative of the ISD are not necessary. We note also, that for a spatially homogeneous noise, the variance tensor is constant and the ISD cancels. However this may happen only on a periodic domain or on the full space. The assumptions on the noise are anyway non optimal but it is not the purpose of this paper to consider non spatially smooth noise since in practice it is smooth.

Note that condition (11) is satisfied for instance if we choose *σ* independent on *t* and equal to *A*−*<sup>r</sup>* with *r* large enough where *A* is the Stokes operator defined below. Indeed, in this case *φk* <sup>=</sup> *<sup>λ</sup>*−*<sup>r</sup> <sup>k</sup> ek* where *(ek)k* is an orthonormal complete system of eigenvectors of *<sup>A</sup>* associated to the eigenvalues *(λk)k* and *φk(t)*<sup>2</sup> *<sup>H</sup>*3*(*S*)* <sup>=</sup> *<sup>λ</sup>*3−2*<sup>r</sup> <sup>k</sup>* . The behavior of the eigenvalues: *λk* <sup>∼</sup> *<sup>k</sup>*2*/d* allows to conclude that (11) follows. Since *us* <sup>=</sup> <sup>1</sup> <sup>2</sup>∇ · *a* and *a* is defined by (2), (12) holds also for *r* large enough since *usH*3*(*S*)* <sup>≤</sup> <sup>∞</sup> *<sup>k</sup>*=<sup>0</sup> *φk(t)*<sup>2</sup> *<sup>H</sup>*4*(*S*)* . Finally, since *A*−*<sup>r</sup>* is self-adjoint and Hilbert-Schmidt for *r > d/*4, it is associated to a symmetric kernel *σ*ˆ which is bounded for *r* large enough.

These convergence results open new interesting possibilities for the study of turbulence or for the proposition of new large-scale representations of fluid dynamics. From the theoretical point of view, it might be interesting to explore multiscale versions of the LU representation based on spatial filtering together with nested noise models. This would generalize classical large eddy models in which the noise would depend on the spatial filtering applied. The coarser the filtering the larger the noise. Energy transfer between scales would then be very interesting to study in this probabilistic setting. Stochastic Karman-Howarth-Monin equations for energy exchanges across scales could be obtained by this way. From a practical point of view, these convergence results justify the setting of such stochastic models to represent large-scale solutions of the Navier-Stokes equations.

(14)

#### **4 Proofs of the Main Result**

We introduce the Stokes operator: *Av* := − <sup>1</sup> *Re P (Δv)* on the domain D*(A)* := *<sup>V</sup>* <sup>∩</sup> *<sup>H</sup>*2*(*S*,* <sup>R</sup>*<sup>d</sup> )*. Let *<sup>b</sup>* be the trilinear form and *<sup>B</sup>* the bilinear operator defined for all *u, v* and *w* ∈ *V* by

$$b(\boldsymbol{u}, \boldsymbol{v}, \boldsymbol{w}) = \int\_{\mathcal{S}} \boldsymbol{w}(\boldsymbol{x}) \left[ \boldsymbol{u}(\boldsymbol{x}) \cdot \nabla \right] \boldsymbol{v}(\boldsymbol{x}) \, d\boldsymbol{x} = (\boldsymbol{B}(\boldsymbol{u}, \boldsymbol{v}), \boldsymbol{w})\_{\boldsymbol{H}}.$$

Recall that for all *u, v* and *w* ∈ *V* : *b(u, v, w)* = −*b(u, w, v)*. As usual, we set *B(u)* = *B(u, u)*. We then define *F* by:

$$\begin{split} F(v) &= \varepsilon^2 B(v, u\_s) - \frac{\varepsilon^2}{2} P \nabla \cdot (a \nabla v) - \frac{\varepsilon^4}{2} \left. P \nabla \cdot (a \nabla u\_s) - \varepsilon^2 A u\_s \right. \\ &\left. + \varepsilon^2 \partial\_t u\_s, \ v \in V. \end{split} \tag{13}$$

It can be seen that *F (v)* ∈ *V* . We next write the noise term as

$$\operatorname{div} G(v) \operatorname{d} W\_l = \sum\_{k=0}^{\infty} \left( -\varepsilon \operatorname{A} \phi\_k - \varepsilon \operatorname{B}(\phi\_k, v) - \varepsilon^3 \operatorname{B}(\phi\_k, u\_s) \right) \operatorname{d} \theta\_{l,k},$$

where, as for *σ*, we omit to write dependance of *φk* on *t*. With these notations, (8) may indeed be rewritten as (9).

Let *(ei)i*≥<sup>0</sup> be the Hilbertian basis of *H* consisting of eigenvectors of *A*. We use the finite dimensional orthogonal projector *Pn*, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, onto *Span(e*0*,... ,en)* and the projected operators:

$$B'' := P\_n B \qquad F'' = P\_n F \qquad G'' = P\_n G \quad .$$

The Galerkin approximation of (9) is given by:

$$\begin{cases} d\_l v\_n(t) + A v\_n(t) \, \mathrm{d}t + \mathcal{B}^n[v\_n(t)] \, \mathrm{d}t + \mathcal{F}^n[v\_n(t)] \, \mathrm{d}t = \mathcal{G}^n[v\_n(t)] \, \mathrm{d}W\_l, \\ v\_n(0) = P\_n(v\_0). \end{cases}$$

This is a finite dimensional system of a stochastic differential equation with smooth coefficients. It has a unique local solution, by the estimate (17) below it is global.

Apply Itô formula to *F (x)* = |*x*| *p <sup>H</sup>* for *p* ≥ 2:

$$\begin{aligned} \left.d\_{l}\right|\left.\upsilon\_{n}(t)\right|\_{H}^{p} &= p\left|\upsilon\_{n}(t)\right|\_{H}^{p-2} \left(\upsilon\_{n}(t), \,\,G^{n}(\upsilon\_{n}(t))\mathrm{d}W\_{l}\right)\_{H} \\ &- p\left|\upsilon\_{n}(t)\right|\_{H}^{p-2} \left(\upsilon\_{n}(t), \,\,A\upsilon\_{n}(t) + \,\,B^{n}\upsilon\_{n}(t) + F^{n}\upsilon\_{n}(t)\right)\_{H} \,\mathrm{d}t \\ &+ \frac{p\left(p-2\right)}{2} \left(G^{n}\upsilon\_{n}(t), \,\upsilon\_{n}(t)\right)\_{H}^{2} \left|\upsilon\_{n}(t)\right|\_{H}^{p-4} \mathrm{d}t + \frac{p}{2} \left\|G^{n}\upsilon\_{n}(t)\right\|\_{\mathcal{L}\_{2}(H)}^{2} \left|\upsilon\_{n}(t)\right|\_{H}^{p-2} \mathrm{d}t. \end{aligned} \tag{15}$$

$$\text{We have } (v\_{\hbar}(t), Av\_{\hbar}(t))\_{H} = \frac{1}{R\_{\ell}} \left\| v\_{\hbar}(t) \right\|\_{V}^{2}, (v\_{\hbar}(t), B^{n}v\_{\hbar}(t))\_{H} = 0 \text{ and } \hbar$$

$$\begin{aligned} \left(\upsilon\_{\boldsymbol{n}}(t), \,\, F^{\boldsymbol{n}}\upsilon\_{\boldsymbol{n}}(t)\right)\_{\boldsymbol{H}} &= \varepsilon^{2} \left( [\upsilon\_{\boldsymbol{n}}(t) \cdot \nabla] \boldsymbol{u}\_{s}, \,\, \upsilon\_{\boldsymbol{n}}(t) \right)\_{\boldsymbol{H}} - \frac{\varepsilon^{2}}{2} \left( \upsilon\_{\boldsymbol{n}}(t), \,\, \nabla \cdot (a \nabla \upsilon\_{\boldsymbol{n}}(t)) \right)\_{\boldsymbol{H}} \\ &- \frac{\varepsilon^{4}}{2} \left( \upsilon\_{\boldsymbol{n}}(t), \,\, \nabla \cdot (a \nabla \boldsymbol{u}\_{s}) \right)\_{\boldsymbol{H}} + \varepsilon^{2} \left( A \boldsymbol{u}\_{s} \,\, \, \upsilon\_{\boldsymbol{n}}(t) \right)\_{\boldsymbol{H}} + \varepsilon^{2} \left( \partial\_{t} \boldsymbol{u}\_{s} \,\, \, \upsilon\_{\boldsymbol{n}}(t) \right)\_{\boldsymbol{H}} \\ &:= F\_{1}^{\boldsymbol{n}} + F\_{2}^{\boldsymbol{n}} + F\_{3}^{\boldsymbol{n}} + F\_{4}^{\boldsymbol{n}} + F\_{5}^{\boldsymbol{n}}. \end{aligned}$$

Under the assumption (12) in Theorem 1, we have the estimate:

$$\left| \left| F\_1^n + F\_3^n + F\_4^n + F\_5^n \right| \right| \le C \left( \varepsilon^2 + \varepsilon^4 \right) \left| \left| \upsilon\_n(t) \right|\_H^2 + C \left( \varepsilon^2 + \varepsilon^4 \right) \right| $$

with *C >* 0 a finite constant. And by the definition of *a*, we have

$$F\_2^n = \frac{\varepsilon^2}{2} \sum\_{k=0}^{\infty} \left| (\phi\_k \cdot \nabla) v\_n(t) \right|\_{L^2(\mathcal{S})}^2.$$

Furthermore, using (11),

$$\frac{1}{2} \left\| G^n v\_n(t) \right\|\_{\mathcal{L}\_2(l^2(H))}^2 \le \frac{\varepsilon^2}{2} \sum\_{k=0}^\infty \left| (\phi\_k \cdot \nabla) v\_n(t) \right|\_{L^2(\mathcal{S})}^2 + \left. C\varepsilon^2 + 2\varepsilon^2 \left| v\_n(t) \right|\_{H^1}^2 \right|$$

and the first term corresponds exactly to *F<sup>n</sup>* <sup>2</sup> . Finally, using again (11),

$$\left(\left(G^n \upsilon\_n(t), \upsilon\_n(t)\right)\_H^2 \le 2\left|\mathcal{C}\left(\varepsilon^2 + \varepsilon^6\right)|\upsilon\_n(t)|\_H^2\right)$$

Hence

$$\left.d\_{l}\right|\upsilon\_{n}(t)\big|\_{H}^{p} + \frac{P}{\mathcal{R}\_{\varepsilon}}\left|\upsilon\_{n}(t)\right|\_{H}^{p-2}\left\|\upsilon\_{n}(t)\right\|\_{V}^{2} \leq p\left|\upsilon\_{n}(t)\right|\_{H}^{p-2}\left(\upsilon\_{n}(t)\right,\ G^{n}(\upsilon\_{n}(t))\mathrm{d}W\_{l}\right)\_{H}$$

$$+\,^{\prime}\mathcal{C}\left(\varepsilon^{2}+\varepsilon^{4}\right)\left|\upsilon\_{n}(t)\right|\_{H}^{p} +\,^{\prime}\mathcal{C}\left[\left(\varepsilon^{2}+\varepsilon^{6}\right)^{\alpha}+\left(\varepsilon^{2}+\varepsilon^{4}\right)\right]\tag{16}$$

with *C >* 0 depending on *p* (and not on *ε* and *n*). We then use classical arguments based in particular on Burkholder-Davis-Gundy inequality to deduce:

$$\frac{1}{2} \mathbb{E} \left[ \sup\_{0 \le t \le T} |v\_n(t)|\_H^p + \int\_0^T |v\_n(t)|\_H^{p-2} \left\| v\_n(t) \right\|\_V^2 \right] \le \mathbb{E} \left[ |v\_0|\_H^p \right] + C \varepsilon^2. \tag{17}$$

Arguing as in [6], we prove that the laws *(*L*(vn))n* are tight in *<sup>L</sup>*2*(*[0*, T* ] ; *H )* and in *<sup>C</sup>*0*(*[0*, T* ] ; <sup>D</sup>*(A*−3*/*2*) )*.

By the Skorohod's embedding theorem, there exists a stochastic basis *(Ω,* F*, (*F*t)t,* <sup>P</sup>*)* with *<sup>L</sup>*2*(*[0*, T* ]; *H )* <sup>∩</sup> *<sup>C</sup>*0*(*[0*, T* ]; <sup>D</sup>*(A*−3*/*2*))*-valued random variables *vn* for *<sup>n</sup>* <sup>≥</sup> 1 and *<sup>v</sup>* such that *vn* has the same law as *vn* on *<sup>L</sup>*2*(*[0*, T* ]; *H )* <sup>∩</sup> *<sup>C</sup>*0*(*[0*, T* ]; <sup>D</sup>*(A*−3*/*2*))* and *<sup>C</sup>*0*(*[0*, T* ]*, U*0*)* cylindrical Wiener processes *<sup>W</sup><sup>n</sup>* for *n* ≥ 1 together with *W* such that (by thinning the sequences)

$$
\overline{v}\_n \to \overline{v} \text{ in } L^2([0, T]; \ H) \cap C^0([0, T]; \mathcal{D}(A^{-3/2})) \qquad \overline{\mathbb{P}} \text{ a.s.} \tag{18}
$$

$$
\overline{W}^n \to \overline{W} \text{ in } C^0([0, T], U\_0) \qquad \overline{\mathbb{P}} \text{ a.s.} \tag{19}
$$

For all integers *n*, *vn* verifies

$$\overline{v}\_n(t) - P\_n(v\_0) + \int\_0^t \left[ A\overline{v}\_n(r) + B^n \overline{v}\_n(r) + F^n \overline{v}\_n(r) \right] \mathrm{d}r = \int\_0^t G^n(\overline{v}\_n(r)) \mathrm{d}\overline{W}\_r^n. \tag{20}$$

We may let *n* → ∞ in this equation and prove that *v* verifies for almost surely *(t, ω)* ∈ [0*, T* ] × *Ω*

$$\overline{v}(t) - v\_0 + \int\_0^t \left( A\overline{v}(r) + B\overline{v}(r) + F\overline{v}(r) \right) \,\mathrm{d}r = \int\_0^t G(\overline{v}(r)) \,\mathrm{d}\overline{W}\_r \tag{21}$$

in the weak sense. For instance, let *w* be a smooth test function, then:

$$\int\_0^t (\mathcal{B}^\mathbf{u}(\overline{v}\_n(r), w)\_H dr = \int\_0^t b((\overline{v}\_n(r), (\overline{v}\_n(r), w) dr = -\int\_0^t b((\overline{v}\_n(r), w, (\overline{v}\_n(r)) dr)) dr)$$

and by the almost sure strong convergence in *L*2*(*0*,T,H)* this converges to <sup>−</sup> ! *<sup>t</sup>* <sup>0</sup> *b((v(r), w, (v(r))dr* when *n* → ∞.

It can be shown that (17) holds for *vn* and letting *n* → ∞ we obtain a bound on *<sup>v</sup>*. In particular, *<sup>v</sup>* <sup>∈</sup> *<sup>L</sup>*2*(Ω* ; *<sup>L</sup>*2*(*[0*, T* ]*, V ))* <sup>∩</sup> *<sup>L</sup>*2*(Ω* ; *<sup>L</sup>*∞*(*[0*, T* ]*, H ))*. We then use the mild form of this equation to prove that *<sup>v</sup>* <sup>∈</sup> *<sup>C</sup>*0*(*[0*, T* ] *, H)* almost surely.

For *d* = 2, we consider *v*<sup>1</sup> and *v*<sup>2</sup> two solutions of (9) on the same probability space *(Ω,* <sup>F</sup>*, (*F*t)t,* <sup>P</sup>*)* and, using Ito formula and classical estimates, prove that

$$\mathbb{E}\left[\sup\_{0\le r\le T} e(r) \left| (v\_1 - v\_2)(r) \right|\_H^2 \right] = 0,$$

where *e(t)* := exp −*α* ! *t* <sup>0</sup> *v*2*(r)*<sup>2</sup> *<sup>V</sup>* d*r* for a well chosen *α*. As E ! *<sup>T</sup>* <sup>0</sup> *v*2*(r)*<sup>2</sup> *<sup>V</sup>* d*r <* <sup>∞</sup>, we deduce <sup>P</sup> a.s, *<sup>v</sup>*<sup>1</sup> <sup>=</sup> *<sup>v</sup>*<sup>2</sup> for all *<sup>t</sup>* ∈ [0*, T* ]. We have proved that pathwise uniqueness holds for *d* = 2. Then, using an argument due to Gyongy and Krylov (see for instance [5], Sect. 5), we conclude that the whole sequence *(vn)n* converges to a unique solution of (21).

Let *v*<sup>0</sup> ∈ *H*. For all *ε >* 0, we have proved that the abstract problem (8) admits martingale solutions *(vε)ε>*0. We then study if *(vε)ε>*<sup>0</sup> converges when [*ε* → 0+] to a solution *v* of the following deterministic Navier-Stokes equation

$$\begin{cases} d\_l v(t) + A v(t) \, \text{d}t + B v(t) \, \text{d}t = 0\\ v(0) = v\_0 \, . \end{cases} \tag{22}$$

When *d* = 2, the solution *vε* is strong and unique. The deterministic Eq.(22) admits also a unique weak solution *v*. By classical estimate, we prove:

$$\mathbb{E}\_{\varepsilon} \left[ \sup\_{0 \le t \le T} e(t) \left| \upsilon\_{\varepsilon}(t) - \upsilon(t) \right|\_{H}^{2} \right] \xrightarrow[\varepsilon \to 0^{+}]{} \mathbf{0},$$

where *e(t)* := exp −*α* ! *t* <sup>0</sup> *v(r)*<sup>2</sup> *<sup>V</sup>* d*r* for some *α >* 0.

When *<sup>d</sup>* <sup>=</sup> 3, inequality (17) shows that L*(vεn ) <sup>n</sup>* are tight in *<sup>L</sup>*2*(*[0*, T* ] ; *H )* <sup>∩</sup> *<sup>C</sup>*0*(*[0*, T* ] ; <sup>D</sup>*(A*−3*/*2*) )*. Using Skorohod embedding theorem, we show that a subsequence converges to the law a weak solution of (22).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Stochastic Benjamin-Bona-Mahony Type Equation**

**Evgueni Dinvay**

**Abstract** Considered herein is a particular nonlinear dispersive stochastic equation. It was introduced recently in Dinvay and Mémin (Proc. R. Soc. A. 478:20220050, 2022), as a model describing surface water waves under location uncertainty. The corresponding noise term is introduced through a Hamiltonian formulation, which guarantees the energy conservation of the flow. Here the initial-value problem is studied.

**Keywords** Water waves · BBM equation · multiplicative noise

**2010 Mathematics Subject Classification** 35Q53, 35Q60, 60H15

#### **1 Introduction**

Consideration is given to the following Stratonovich one-dimensional BBM-type equation

$$du = -\partial\_{\lambda}K\left(u + Ku^{2}\right)dt + \sum\_{j} \gamma\_{j}\partial\_{\lambda}\left(u + Ku^{2}\right)\diamond dW\_{j} \tag{1}$$

introduced in [4], as a model describing surface waves of a fluid layer. It is supplemented with the initial condition *u(*0*)* = *u*0*.* Equation (1) has a Hamiltonian structure with the energy

$$\mathcal{H}(u) = \int\_{\mathbb{R}} \left( \frac{1}{2} \left( K^{-1/2} u \right)^2 + \frac{1}{3} u^3 \right) d\mathbf{x}. \tag{2}$$

E. Dinvay (-)

Inria Rennes - Bretagne Atlantique, Campus universitaire de Beaulieu Avenue du Général Leclerc, Rennes Cedex, France e-mail: Evgueni.Dinvay@inria.fr

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_3

The Fourier multiplier operator *K*, defined in the space of tempered distributions S *(*R*)*, has an even symbol of the form

$$K(\xi) \cong (1 + \xi^2)^{-\sigma\_0} \tag{3}$$

with *σ*<sup>0</sup> *>* 1*/*2. Expression (3) means that the symbol *K(ξ )* is bounded from below and above by RHS(3) multiplied by some positive constants. In other words the operator *K* essentially behaves as the Bessel potential of order 2*σ*0, see [6]. The space variable is *<sup>x</sup>* <sup>∈</sup> <sup>R</sup> and the time variable is *<sup>t</sup>* 0. The unknown *<sup>u</sup>* is a real valued function of these variables and of the probability variable *ω* ∈ , representing the free surface elevation in the fluid layer. The scalar sequence {*γj* } satisfies the restriction *<sup>j</sup> γ* <sup>2</sup> *<sup>j</sup> <* ∞*,* and {*Wj* } is a sequence of independent scalar Brownian motions on a filtered probability space *(,* <sup>F</sup>*,*{F*t*}*,* <sup>P</sup>*).*

Model (1) was introduced in [4], where an attempt to extend an elegant Hamiltonian formulation of [1] to the stochastic setting was made. We will just briefly comment on the methodology of [4]. The white noise is firstly introduced via the stochastic transport theory presented in [8], which is based on splitting of fluid particle motion into smooth and random movements. Then it is restricted to a particular Stratonovich form in order to respect the energy conservation. In particular, it provides us with a model having multiplicative noise of Hamiltonian structure. Finally, a long wave approximation results in simplified models as (1), for example.

One may notice that after discarding the nonlinear terms in Eq. (1), the details can be seen in [4], the corresponding linearised initial-value problem can be solved exactly with the help of the fundamental multiplier operator

$$\mathcal{S}(t, t\_0) = \exp\left[ -\partial\_{\mathbf{x}} K(t - t\_0) + \sum\_{j} \gamma\_j \partial\_{\mathbf{x}} (W\_j(t) - W\_j(t\_0)) \right],\tag{4}$$

where *<sup>t</sup>*0*, t* <sup>∈</sup> <sup>R</sup>. Note that it can be factorised as <sup>S</sup>*(t, t*0*)* <sup>=</sup> *S(t* <sup>−</sup> *<sup>t</sup>*0*)SW (t, t*0*),* where *S(t)* = exp*(*−*∂xKt)* is a unitary semi-group and *SW* containing all the randomness coming from the Wiener process is unitary as well. They obviously commute as bounded differential operators. We recall that *S(t)* is defined via the Fourier transform F *(S(t)ψ)* = exp*(*−*iξK(ξ )t)ψ(ξ )* for any *ψ* ∈ S *(*R*)* and *ψ* = F*ψ.* Similarly, *SW (t, t*0*)* is defined by the line

$$S\_W(t, t\_0)\psi = \mathfrak{F}^{-1}\left(\xi \mapsto \exp\left(i\xi \sum\_j \gamma\_j (W\_j(t) - W\_j(t\_0))\right) \widehat{\psi}(\xi)\right).$$

It allows us to represent (1) in the Duhamel form

$$u(t) = \mathcal{S}(t,0)\left(u\_0 + \int\_0^t \mathcal{S}(0,s)f(u(s))ds + \sum\_j \gamma\_j \int\_0^t \mathcal{S}(0,s)g(u(s))dW\_j(s)\right),\tag{5}$$

where

$$f(u) = -\partial\_{\underline{\boldsymbol{x}}}K^2u^2 + \sum\_{j} \gamma\_j^2 \partial\_{\underline{\boldsymbol{x}}}K \left(\mu \partial\_{\underline{\boldsymbol{x}}}Ku^2\right)^2$$

and

*g(u)* <sup>=</sup> *∂xKu*<sup>2</sup>*.*

Existence and uniqueness of solution to Eq. (5) is under consideration. It is worth to point out that both *SW* and the stochastic integral in (5) are well defined. Indeed, appealing to Doobs' inequalities for the submartingale *n*+*<sup>m</sup> <sup>j</sup>*=*<sup>n</sup> γjWj* and the Itô-Nisio theorem one can show that *<sup>j</sup> γjWj* converges uniformly in time almost surely, in probability and in *L*<sup>2</sup> sense. If the integrand of the stochastic integral in (5) is in some Sobolev space *H<sup>σ</sup> (*R*)* for each *s* and a.e. *ω*, then we can understand this sum of integrals as an integration with respect to a *Q*-Wiener process associated with a Hilbert space *H* and a non-negative symmetric trace class operator *Q* having eigenvalues *γ* <sup>2</sup> *<sup>j</sup>* and eigenfunctions *ej* forming an orthonormal basis in *H*. Then the corresponding integrand is the unbounded linear operator between *H* and *H<sup>σ</sup> (*R*)* that maps all *ej* to the same element of *<sup>H</sup><sup>σ</sup> (*R*)*, namely, to <sup>S</sup>*(*0*, s)g(u(s)).* In particular, it explains why we need the summability condition *<sup>j</sup> γ* <sup>2</sup> *<sup>j</sup> <* ∞*.*

Before we formulate the main result it is left to introduce a notation as follows. By *C(*0*, T* ; *<sup>H</sup><sup>σ</sup> (*R*))* we will notate the space of continuous functions on [0*, T* ] having values in *H<sup>σ</sup> (*R*)* with the usual supremum norm.

**Theorem 1** *Let σ*<sup>0</sup> *>* 1*/*2 *and σ* max{*σ*0*,* 1}*. Then for any* F0*-measurable u*<sup>0</sup> ∈ *<sup>L</sup>*2*(*; *<sup>H</sup><sup>σ</sup> (*R*))* <sup>∩</sup> *<sup>L</sup>*∞*(*; *<sup>H</sup>σ*<sup>0</sup> *(*R*)) with sufficiently small <sup>L</sup>*∞*Hσ*<sup>0</sup> *-norm and any <sup>T</sup>*<sup>0</sup> *<sup>&</sup>gt;* <sup>0</sup> *Eq.* (5) *has a unique adapted solution <sup>u</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*;*C(*0*, T*0; *<sup>H</sup><sup>σ</sup> (*R*)))* <sup>∩</sup> *<sup>L</sup>*∞*(*;*C(*0*, T*0; *<sup>H</sup>σ*<sup>0</sup> *(*R*))). Moreover,* <sup>H</sup>*(u(t))* <sup>=</sup> <sup>H</sup>*(u*0*) for each <sup>t</sup>* ∈ [0*, T*0] *almost surely on .*

The conservation of energy (2) plays a crucial role in the proof. So it will be a bit more convenient to regard the energy norm defined by

$$\|\mu\|\_{\mathcal{H}}^2 = \frac{1}{2} \int\_{\mathbb{R}} \left( K^{-1/2} u \right)^2 dx$$

instead of the spatial *Hσ*<sup>0</sup> -norm. They are obviously equivalent.

The proof is essentially based on the contraction mapping principle. We do not exploit much smoothing properties of the group S*(t, t*0*)*, as for example is done in [2] for analysis of a stochastic nonlinear Schrödinger equation. It is enough to know that the absolute value of its symbol equals one, and that *S(t)* is a unitary semigroup. However, in order to appeal to the fixed point theorem we have to truncate both deterministic *f* and random *g* nonlinearities. There are a couple of technical difficulties related to implementation of the energy conservation in our case. Firstly, for the truncated equation we can claim H-conservation only until a particular stopping time. Secondly, one can control *u*<sup>H</sup> with <sup>H</sup>*(u)* only provided *u*<sup>H</sup> is small. These additional difficulties make us repeat the arguments of the last section in the paper iteratively in order to construct solution on the whole time interval [0*, T*0].

As a final remark we point out that the noise in Eq. (1) can be gathered in one dimensional *∂x <sup>u</sup>* <sup>+</sup> *Ku*<sup>2</sup> ◦ *dB* with the scalar Brownian motion *<sup>B</sup>* <sup>=</sup> *<sup>j</sup> γjWj .* However, this does not affect the proof below anyhow, so we continue to stick to the original formulation (1). In future works we are planning to extend it to *γj* being either Fourier multipliers or space-dependent coefficients.

#### **2 Truncation**

The Sobolev space *H<sup>σ</sup> (*R*)* consists of tempered distributions *u* having the finite square norm *u*<sup>2</sup> *<sup>H</sup><sup>σ</sup>* <sup>=</sup> ! <sup>|</sup> *u(ξ )*<sup>|</sup> 2 <sup>1</sup> <sup>+</sup> *<sup>ξ</sup>* <sup>2</sup> *<sup>σ</sup> dξ <* <sup>∞</sup>*.* Let *<sup>θ</sup>* <sup>∈</sup> *<sup>C</sup>*<sup>∞</sup> <sup>0</sup> *(*R*)* with supp *<sup>θ</sup>* <sup>∈</sup> [−2*,* <sup>2</sup>] being such that *θ (x)* <sup>=</sup> 1 for *<sup>x</sup>* ∈ [−1*,* <sup>1</sup>] and 0 *θ (x)* 1 for *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*.* For *R >* 0 we introduce the cut off *θR(x)* = *θ (x/R)* and

$$f\_R(\mu) = \theta\_R(\|\mu\|\_{H^\sigma})f(\mu), \quad \mathbf{g}\_R(\mu) = \theta\_R(\|\mu\|\_{H^\sigma})\mathbf{g}(\mu).$$

that we substitute in (5) instead of *f (u)*, *g(u)*, respectively. The new *R*regularisation of (5) reads as

$$u(t) = \mathcal{S}(t, t\_0) \left( u(t\_0) + \int\_{t\_0}^t \mathcal{S}(t\_0, s) f\_R(u(s)) ds + \sum\_j \gamma\_j \int\_{t\_0}^t \mathcal{S}(t\_0, s) g\_R(u(s)) dW\_j(s) \right). \tag{6}$$

In this section without loss of generality we can set *t*<sup>0</sup> = 0 and *u(t*0*)* = *u*0. We will vary time moments *t*<sup>0</sup> below in the next section. Equation (6) can be solved with a help of the contraction mapping principle in *<sup>L</sup>*2*(*;*C(*0*, T* ; *<sup>H</sup><sup>σ</sup> (*R*))).*

**Proposition 1** *Let σ >* <sup>1</sup>*/*2*, <sup>u</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*; *<sup>H</sup><sup>σ</sup> (*R*)) be* <sup>F</sup>0*-measurable and <sup>T</sup>*<sup>0</sup> *<sup>&</sup>gt;* <sup>0</sup>*. Then* (6) *has a unique adapted solution <sup>u</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*;*C(*0*, T*0; *<sup>H</sup><sup>σ</sup> (*R*))). Moreover, it depends continuously on the initial data u*0*.*

*Proof* We set T *u(t)* = RHS(6)*.* We will show that T is a contraction mapping in *XT* <sup>=</sup> *<sup>L</sup>*2*(*;*C(*0*, T* ; *<sup>H</sup><sup>σ</sup> (*R*))),* provided *T >* 0 is sufficiently small, depending only on *R*. Let *u*1*, u*<sup>2</sup> be two adapted processes in *XT* . Firstly, one can notice that

$$\begin{aligned} \|f\_R(u\_1) - f\_R(u\_2)\|\_{H^\sigma} &\leqslant C \left(1 + R\right)^2 \|u\_1 - u\_2\|\_{H^\sigma}, \\\\ \|g\_R(u\_1) - g\_R(u\_2)\|\_{H^\sigma} &\leqslant CR \left\|u\_1 - u\_2\right\|\_{H^\sigma}. \end{aligned}$$

Indeed, *H<sup>σ</sup> (*R*)* poses an algebraic property for *σ >* 1*/*2 and *∂xK* is bounded in *<sup>H</sup><sup>σ</sup> (*R*)*. Then assuming *u*1*H<sup>σ</sup> u*2*H<sup>σ</sup>* without loss of generality one deduces

$$\begin{split} \|\|g\_{R}(\boldsymbol{u}\_{1}) - \boldsymbol{g}\_{R}(\boldsymbol{u}\_{2})\|\_{H^{\sigma}} &\leqslant C \left\|\theta\_{R}(\|\boldsymbol{u}\_{1}\|\_{H^{\sigma}})\boldsymbol{u}\_{1}^{2} - \theta\_{R}(\|\boldsymbol{u}\_{2}\|\_{H^{\sigma}})\boldsymbol{u}\_{2}^{2}\right\|\_{H^{\sigma}} \\ &\leqslant C\theta\_{R}(\|\boldsymbol{u}\_{1}\|\_{H^{\sigma}})\left\|\boldsymbol{u}\_{1}^{2} - \boldsymbol{u}\_{2}^{2}\right\|\_{H^{\sigma}} \\ &\qquad + |\theta\_{R}(\|\boldsymbol{u}\_{1}\|\_{H^{\sigma}}) - \theta\_{R}(\|\boldsymbol{u}\_{2}\|\_{H^{\sigma}})|\,\Big\|\boldsymbol{u}\_{2}^{2}\Big\|\_{H^{\sigma}} \\ &\leqslant CR\,\|\boldsymbol{u}\_{1} - \boldsymbol{u}\_{2}\|\_{H^{\sigma}}\,, \end{split}$$

where we have used the estimate <sup>|</sup>*θR(u*1*H<sup>σ</sup> )* <sup>−</sup> *θR(u*2*H<sup>σ</sup> )*<sup>|</sup> & &*θ* & & *<sup>L</sup>*<sup>∞</sup> *<sup>R</sup>*−<sup>1</sup> *u*<sup>1</sup> − *u*2*H<sup>σ</sup>* following obviously from the mean value theorem. The difference between *fR(u*1*)* and *fR(u*2*)* can be obtained in the same way. Thus

$$\begin{aligned} \|\mathcal{T}u\_1(t) - \mathcal{T}u\_2(t)\|\_{H^{\sigma}} &\leqslant \left\| \int\_0^t \mathcal{S}(0, s)(f\_R(u\_1(s)) - f\_R(u\_2(s)))ds \right\|\_{H^{\sigma}} \\ &+ \left\| \sum\_j \gamma\_j \int\_0^t \mathcal{S}(0, s)(g\_R(u\_1(s)) - g\_R(u\_2(s)))dW\_j(s) \right\|\_{H^{\sigma}} = I + II. \end{aligned}$$

The first integral is estimated straightforwardly as

$$I \leqslant \int\_0^T \|f\_R(u\_1(s)) - f\_R(u\_2(s))\|\_{H^{\sigma}} ds \leqslant C(1+R)^2 T \left\|u\_1 - u\_2\right\|\_{C(0,T;H^{\sigma})}.$$

The second one is estimated with the use of the Burkholder inequality [5] as

$$\mathbb{E} \sup\_{0 \le t \le T} I I^2 \lesssim C \mathbb{E} \int\_0^T \| \mathbf{g}\_R(\boldsymbol{u}\_1(\boldsymbol{s})) - \mathbf{g}\_R(\boldsymbol{u}\_2(\boldsymbol{s})) \|\_{H^\sigma}^2 \, ds \lesssim C R^2 T \mathbb{E} \, \| \boldsymbol{u}\_1 - \boldsymbol{u}\_2 \|\_{C(0,T;H'')}^2.$$

It is clear that time-continuity of T *u*1*,* T *u*<sup>2</sup> follows from the factorisation S = *SSW* and the estimate *SW gR(u)H<sup>σ</sup> CR*2*,* so we have a stochastic convolution as in [5, Lemma 3.3]. Thus

$$\|\mathcal{T}u\_1 - \mathcal{T}u\_2\|\_{X\_T} \lesssim C\left((1+R)^2T + R\sqrt{T}\right) \|u\_1 - u\_2\|\_{X\_T},$$

and so there exists a small *T* depending only on *R* such that T has a unique fixed point in *XT* . Moreover, this estimate also gives us continuous dependence of solution in *XT* on the initial data *<sup>u</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*; *<sup>H</sup><sup>σ</sup> (*R*)),* obviously. Clearly, the solution can be extended to the whole interval [0*, T*0].

The regularisation affects the energy conservation. Indeed, in the Itô differential form Eq. (6) reads

$$\begin{split} du &= \left( -\partial\_{\mathbf{x}} K u + \frac{1}{2} \sum\_{j} \gamma\_{j}^{2} \partial\_{\mathbf{x}}^{2} u + f\_{R}(u) + \sum\_{j} \gamma\_{j}^{2} \partial\_{\mathbf{x}} g\_{R}(u) \right) dt \\ &+ \sum\_{j} \gamma\_{j} \left( \partial\_{\mathbf{x}} u + g\_{R}(u) \right) dW\_{j}, \end{split} \tag{7}$$

and so applying the Itô formula to the energy functional H*(u(t))* defined by (2) with the use of (7), one can easily obtain

$$d\mathcal{H}(u) = \left( (\theta\_{\mathbb{R}} - 1) \int u^2 \partial\_{\mathbb{X}} K u dx + \theta\_{\mathbb{R}} \left( \theta\_{\mathbb{R}} - 1 \right) \sum\_{j} \gamma\_j^2 \int \left( \frac{1}{2} \mathbf{g}(u) K^{-1} \mathbf{g}(u) + \mathbf{u} g^2(u) \right) dx \right) d\mathbf{x} \,. \tag{8}$$

Indeed, assuming *σ σ*<sup>0</sup> + 2 at first, we notice that the solution *u* given by Proposition 1 solves Eq. (7). Let us introduce the following notations

$$
\Psi(t)dt + \Phi(t)dW = \Psi(t)dt + \sum\_{j} \gamma\_j \Phi(t)e\_j dW\_j = \text{RHS}(7).
$$

Then Itô's formula reads

$$\begin{aligned} \mathcal{H}(\boldsymbol{u}(t)) &= \mathcal{H}(\boldsymbol{u}\_{0}) + \int\_{0}^{t} \partial\_{\boldsymbol{u}} \mathcal{H}(\boldsymbol{u}(s)) \boldsymbol{\Psi}(\boldsymbol{s}) \mathrm{d}s + \int\_{0}^{t} \partial\_{\boldsymbol{u}} \mathcal{H}(\boldsymbol{u}(s)) \boldsymbol{\Phi}(\boldsymbol{s}) \mathrm{d}W(\boldsymbol{s}) \\ &+ \frac{1}{2} \int\_{0}^{t} \mathrm{tr} \, \partial\_{\boldsymbol{u}}^{2} \mathcal{H}(\boldsymbol{u}(s)) (\boldsymbol{\Phi}(\boldsymbol{s}), \boldsymbol{\Phi}(\boldsymbol{s})) \mathrm{d}s, \end{aligned}$$

where the Fréchet derivatives are defined by

$$\begin{aligned} \partial\_{\mu} \mathcal{H}(u)\phi &= \int\_{\mathbb{R}} \left( K^{-1/2} u K^{-1/2} \phi + u^2 \phi \right) dx, \\\\ \partial\_{\mu}^2 \mathcal{H}(u)(\phi, \psi) &= \int\_{\mathbb{R}} \left( K^{-1/2} \phi K^{-1/2} \psi + 2u \phi \psi \right) dx. \end{aligned}$$

at every *φ,ψ* <sup>∈</sup> *<sup>H</sup>σ*<sup>0</sup> *(*R*).* Substituting these expressions together with the definitions of and into the Itô's formula one obtains (8). Let us, for example, calculate the stochastic integral

$$\int\_0^t \partial\_u \mathcal{H}(u(s)) \Phi(s) dW(s) = \sum\_j \gamma\_j \int\_0^t \int\_{\mathbb{R}} \left( K^{-1/2} u K^{-1/2} + u^2 \right) \Phi \left( \sum\_j \partial\_u \mathcal{H}(u) \partial\_u K u \right) \Phi \left( \sum\_j \partial\_u K u \right) \Phi \left( \sum\_j \partial\_u K u \right) \Phi \left( \sum\_j \partial\_u K u \right) \Phi \left( \sum\_j \partial\_u K u \right)$$
 
$$\left( \partial\_x u + \theta\_R(\|u\|\_{H^\sigma}) \partial\_x K u^2 \right) dx \, dW\_j$$

that equals zero as one can see integrating by parts in the space integral. Similarly, one calculates the other two integrals in the Itô formula. Thus we have proved (8) for *σ σ*<sup>0</sup> + 2. In order to lower the bound for *σ*, one would like to argue here by approximation of initial value *u*<sup>0</sup> via smooth functions and appeal to the continuous dependence on *u*0, however, there is a problem here, since *θR* in (8) contains the dependence on *σ*. So even for a smooth initial data the corresponding solution lies a priori only in *H<sup>σ</sup>* . This difficulty is overcome in the next statement, where we argue similar to [3].

**Proposition 2** *Let σ*<sup>0</sup> *>* 1*/*2 *and σ* max{*σ*0*,* 1}*. Then* (8) *holds almost surely for u satisfying Eq.* (6) *given by Proposition 1.*

*Proof* The main idea is to cut off high frequencies of the differential operator *∂x* in (7) as follows. Let *Pλ* be a Fourier multiplier with the symbol *θλ*, *λ >* 0. It is defined by the expression F*(Pλψ)* = *θλψ.* Now we consider instead of (7) the following regularisation

$$\begin{split} du &= \left( -\partial\_{\mathbf{x}} K u + \frac{1}{2} \sum\_{j} \chi\_{j}^{2} \partial\_{\mathbf{x}}^{2} P\_{\lambda}^{2} u + f\_{\mathcal{R}}(u) + \sum\_{j} \chi\_{j}^{2} \partial\_{\mathbf{x}} P\_{\lambda} g\_{\mathcal{R}}(u) \right) dt \\ &+ \sum\_{j} \chi\_{j} \left( \partial\_{\mathbf{x}} P\_{\lambda} u + g\_{\mathcal{R}}(u) \right) dW\_{j} \end{split} \tag{9}$$

that has a strong solution. Indeed, it contains only bounded operators and the corresponding mild equation has exactly the same form as Eq. (6) with <sup>S</sup>*<sup>λ</sup>* <sup>=</sup> *SS<sup>λ</sup> W* now instead of S, where

$$S\_W^\lambda = \exp\left[\sum\_j \gamma\_j \partial\_\mathbf{x} P\_\lambda(W\_j(t) - W\_j(t\_0))\right].$$

So we can actually apply Proposition 1 to obtain *u* = *uλ* solving (9). Let *u* = *u*<sup>∞</sup> stay for the solution of the original Eq. (6). Firstly, we will check that *uλ* → *u*<sup>∞</sup> in *<sup>L</sup>*2*(*;*L*2*(*0*, T*0; *<sup>H</sup><sup>σ</sup> (*R*)))* for any *σ >* <sup>1</sup>*/*2 as *<sup>λ</sup>* → ∞.

Let 0 *t T T*0*,* where a positive small enough time moment *T* is to be chosen below. Then

$$\begin{aligned} \|\boldsymbol{u}\_{\lambda}(t) - \boldsymbol{u}\_{\infty}(t)\|\_{H^{\sigma}} &= \left\|\boldsymbol{\mathcal{T}}^{\lambda}\boldsymbol{u}\_{\lambda}(t) - \boldsymbol{\mathcal{T}}^{\infty}\boldsymbol{u}\_{\infty}(t)\right\|\_{H^{\sigma}} \\ &\leqslant \left\|\left(\boldsymbol{\mathcal{S}}^{\lambda}(t,0) - \boldsymbol{\mathcal{S}}^{\infty}(t,0)\right)\boldsymbol{u}\_{0}\right\|\_{H^{\sigma}} \\ &\quad + \left\|\int\_{0}^{t} \left(\boldsymbol{\mathcal{S}}^{\lambda}(t,s) - \boldsymbol{\mathcal{S}}^{\infty}(t,s)\right) f\_{R}(\boldsymbol{u}\_{\infty}(s)) ds\right\|\_{H^{\sigma}} \\ &\quad + \left\|\int\_{0}^{t} \boldsymbol{\mathcal{S}}^{\lambda}(t,s)(f\_{R}(\boldsymbol{u}\_{\lambda}(s)) - f\_{R}(\boldsymbol{u}\_{\infty}(s))) ds\right\|\_{H^{\sigma}} \end{aligned}$$

$$\begin{aligned} &+\left\| \left( \mathcal{S}^{\lambda}(t,0) - \mathcal{S}^{\infty}(t,0) \right) \sum\_{j} \mathcal{Y}\_{j} \int\_{0}^{t} \mathcal{S}^{\infty}(0,s) g\_{R}(u\_{\infty}(s)) dW\_{j}(s) \right\|\_{H^{\sigma}} \\ &+ \left\| \sum\_{j} \mathcal{Y}\_{j} \int\_{0}^{t} \left( \mathcal{S}^{\lambda}(0,s) - \mathcal{S}^{\infty}(0,s) \right) g\_{R}(u\_{\infty}(s)) dW\_{j}(s) \right\|\_{H^{\sigma}} \\ &+ \left\| \sum\_{j} \mathcal{Y}\_{j} \int\_{0}^{t} \mathcal{S}^{\lambda}(0,s) (g\_{R}(u\_{\lambda}(s)) - g\_{R}(u\_{\infty}(s))) dW\_{j}(s) \right\|\_{H^{\sigma}} \\ &= I\_{1} + \dots + I\_{6}. \end{aligned}$$

The terms *I*<sup>3</sup> and *I*<sup>6</sup> are estimated exactly as the analogous integrals *I* and *I I* in the proof of Proposition 1, namely,

$$I\_3 \leqslant C(1+R)^2 \sqrt{T} \left\| u\_\lambda - u\_\infty \right\|\_{L^2(0,T;H^\sigma)}$$

and

$$\mathbb{E}\sup\_{0\le I\le T}I\_6^2 \lesssim C\mathbb{E}\int\_0^T \left\|g\_R(\boldsymbol{\mu}\_\lambda(\mathbf{s})) - g\_R(\boldsymbol{\mu}\_\infty(\mathbf{s}))\right\|\_{H^\sigma}^2 ds,$$

$$\lesssim C\mathcal{R}^2\mathbb{E}\left\|\boldsymbol{\mu}\_\lambda - \boldsymbol{\mu}\_\infty\right\|\_{L^2(0,T;H^\sigma)}^2.$$

Thus

$$\mathbb{E} \int\_0^T \left( I\_3^2 + I\_6^2 \right) dt \leqslant C \left( (1+R)^4 T^2 + R^2 T \right) \mathbb{E} \left\| u\_\lambda - u\_\infty \right\|\_{L^2(0,T;H^\sigma)}^2,$$

and so there exists a small *T >* 0 depending only on *R* such that

$$\mathbb{E}\left\|u\_{\lambda}-u\_{\infty}\right\|\_{L^{2}(0,T;H^{\sigma})}^{2} \leqslant C\mathbb{E}\int\_{0}^{T}\left(I\_{1}^{2}+I\_{2}^{2}+I\_{4}^{2}+I\_{5}^{2}\right)dt.$$

One needs to show that the right hand side of this expression tends to zero when *λ* → ∞. All these four integrals are treated similarly. Indeed, let us regard more closely the first one

$$I\_1^2 = \int \left| \exp\left(i\xi\theta\_\lambda(\xi)\sum\_j \gamma\_j W\_j(t)\right) - \exp\left(i\xi\sum\_j \gamma\_j W\_j(t)\right) \right|^2 |\widehat{u\_0}(\xi)|^2 \left(1 + \xi^2\right)^\sigma d\xi$$

that obviously tends to zero as *<sup>λ</sup>* → ∞ for a.e. *<sup>ω</sup>* and any *<sup>t</sup>*. Hence <sup>E</sup> ! *<sup>T</sup>* <sup>0</sup> *<sup>I</sup>* <sup>2</sup> <sup>1</sup> *dt* → 0 by the dominated convergence theorem, sine *<sup>I</sup>*<sup>1</sup> <sup>2</sup> *u*0*H<sup>σ</sup> .* The integral of *<sup>I</sup>* <sup>2</sup> 4

is estimated exactly in the same manner with the stochastic integral of S<sup>∞</sup>*gR(u*∞*)* standing in place of *u*0. The second integral

$$\mathbb{E} \int\_0^T I\_2^2 dt \leqslant T \mathbb{E} \int\_0^T \int\_0^T \left\| \left( \mathcal{S}^\lambda(t, s) - \mathcal{S}^\infty(t, s) \right) f\_R(u\_\infty(s)) \right\|\_{H^\sigma}^2 ds dt \to 0$$

by the dominated convergence theorem, since *...*<sup>2</sup> *<sup>H</sup><sup>σ</sup> CR*2*(*1+*R)*4*.* Finally, the last integral

$$\mathbb{E} \int\_0^T I\_\S^2 dt \leqslant T \mathbb{E} \sup\_{t \in [0,T]} I\_\S^2 \leqslant CT \mathbb{E} \int\_0^T \left\| \left( \mathcal{S}^k(0,s) - \mathcal{S}^\infty(0,s) \right) \varrho\_R(u\_\infty(s)) \right\|\_{H^\sigma}^2 ds \to 0$$

by the Burkholder inequality and the dominated convergence theorem, since *...*<sup>2</sup> *<sup>H</sup><sup>σ</sup> CR*4*.*

Repeating this argument iteratively on subintervals of [0*, T*0] of the size *T* one obtains that *uλ* <sup>→</sup> *<sup>u</sup>*<sup>∞</sup> in *<sup>L</sup>*2*(* × [0*, T*0]; *<sup>H</sup><sup>σ</sup> (*R*)).*

Let us calculate each term in the Itô formula for *u* = *uλ*. As we shall see the corresponding stochastic integral is not zero, and moreover, it is difficult to pass to the limit *λ* → ∞ treating the stochastic part. So instead of H we consider at first a sequence <sup>H</sup>*n*, *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>, with the cubic term being cut off in the following way

$$\mathcal{H}\_n(\mu) = \left\|\mu\right\|\_{\mathcal{H}}^2 + \frac{1}{3}\theta\_n\left(\left\|\mu\right\|\_{\mathcal{H}}^2\right)\int \mu^3 dx$$

that clearly tends to H*(u)* almost surely at any fixed time moment. The corresponding Fréchet derivatives are defined by

$$\partial\_{\boldsymbol{u}} \mathcal{H}\_{\boldsymbol{n}}(\boldsymbol{u}) \boldsymbol{\phi} = \int\_{\mathbb{R}} \left[ \left( 1 + \frac{1}{3} \theta\_{\boldsymbol{n}}' \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^2 \right) \int \boldsymbol{u}^3 d\boldsymbol{y} \right) \boldsymbol{K}^{-1/2} \boldsymbol{u} \boldsymbol{K}^{-1/2} \boldsymbol{\phi} + \theta\_{\boldsymbol{n}} \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^2 \right) \boldsymbol{u}^2 \boldsymbol{\phi} \right] d\boldsymbol{x},$$

$$\begin{split} \partial\_{\boldsymbol{u}}^{2} \mathcal{H}\_{\boldsymbol{u}}(\boldsymbol{u})(\boldsymbol{\phi},\boldsymbol{\psi}) &= \int\_{\mathbb{R}} \left[ \left( 1 + \frac{1}{3} \theta\_{\boldsymbol{u}}' \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \int \boldsymbol{u}^{3} \boldsymbol{d} \boldsymbol{x} \right) \boldsymbol{K}^{-1/2} \boldsymbol{\phi} \boldsymbol{K}^{-1/2} \boldsymbol{\psi} + 2 \theta\_{\boldsymbol{u}} \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \boldsymbol{u} \boldsymbol{\phi} \boldsymbol{\psi} \right] d\boldsymbol{x} \\ &+ \theta\_{\boldsymbol{u}}' \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \int \boldsymbol{u}^{2} \boldsymbol{\phi} d\boldsymbol{x} \int \boldsymbol{K}^{-1/2} \boldsymbol{u} \boldsymbol{K}^{-1/2} \boldsymbol{\psi} d\boldsymbol{y} \\ &+ \frac{1}{3} \theta\_{\boldsymbol{u}}'' \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \int \boldsymbol{u}^{3} \boldsymbol{d} \boldsymbol{x} \int \boldsymbol{K}^{-1/2} \boldsymbol{u} \boldsymbol{K}^{-1/2} \boldsymbol{\phi} d\boldsymbol{y} \\ &\int \boldsymbol{K}^{-1/2} \boldsymbol{u} \boldsymbol{K}^{-1/2} \boldsymbol{\psi} d\boldsymbol{z} \end{split}$$

at every *φ,ψ* <sup>∈</sup> *<sup>H</sup>σ*<sup>0</sup> *(*R*).* Substituting it to the stochastic integral one obtains the following expression that can be simplified by integration by parts

$$\begin{split} &\int\_{0}^{l} \partial\_{\boldsymbol{u}} \mathcal{H}\_{n}(\boldsymbol{u}(\boldsymbol{s})) \boldsymbol{\Phi}(\boldsymbol{s}) d\boldsymbol{W}(\boldsymbol{s}) \\ &= \sum\_{j} \boldsymbol{\chi}\_{j} \int\_{0}^{l} \int\_{\mathbb{R}} \left[ \left( 1 + \frac{1}{3} \theta\_{n}^{\prime} \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \int \boldsymbol{u}^{3} d\boldsymbol{y} \right) K^{-1/2} \boldsymbol{u} \boldsymbol{K}^{-1/2} + \theta\_{n} \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \boldsymbol{u}^{2} \right] \\ & \left( \partial\_{\boldsymbol{u}} P\_{\boldsymbol{\lambda}} \boldsymbol{u} + \theta\_{\mathcal{R}} (\|\boldsymbol{u}\|\_{\boldsymbol{H}^{\mathcal{O}}}) \partial\_{\boldsymbol{u}} \boldsymbol{K} \boldsymbol{u}^{2} \right) d\boldsymbol{x} d\boldsymbol{W}\_{j} = \sum\_{j} \boldsymbol{\chi}\_{j} \int\_{0}^{l} \theta\_{n} \left( \|\boldsymbol{u}\|\_{\mathcal{H}}^{2} \right) \int\_{\mathbb{R}} \boldsymbol{u}^{2} \partial\_{\boldsymbol{x}} P\_{\boldsymbol{\lambda}} \boldsymbol{u} d\boldsymbol{x} d\boldsymbol{W}\_{j}, \end{split}$$

where *u* = *uλ*. We will show that this integral tends to zero as *λ* → ∞. That is exactly the place where we need the cut off *θn*. Applying some algebraic manipulations to the space integral and the Burkholder inequality to the stochastic integral, one deduces the estimate

$$\begin{split} &\mathbb{E}\sup\_{0\leqslant t\leqslant T\_{0}}\left|\int\_{0}^{t}\partial\_{\boldsymbol{u}}\mathcal{H}\_{n}(\boldsymbol{u}(s))\Phi(s)dW(s)\right|^{2} \\ &\leqslant C\mathbb{E}\int\_{0}^{T\_{0}}\theta\_{n}^{2}\left(\left\lVert\boldsymbol{u}\_{\lambda}(t)\right\rVert\_{\mathcal{H}}^{2}\right)\left(\int\_{\mathbb{R}}\boldsymbol{u}\_{\lambda}^{2}(t)\partial\_{\boldsymbol{x}}(P\_{\lambda}-1)\boldsymbol{u}\_{\lambda}(t)dx\right)^{2}dt \\ &\leqslant C\mathbb{E}\int\_{0}^{T\_{0}}\theta\_{n}^{2}\left(\left\lVert\boldsymbol{u}\_{\lambda}(t)\right\rVert\_{\mathcal{H}}^{2}\right)\left\lVert\boldsymbol{u}\_{\lambda}(t)\right\rVert\_{\mathcal{H}}^{4} \\ &\qquad\left(\left\lVert(P\_{\lambda}-1)\boldsymbol{u}\_{\infty}(t)\right\rVert\_{H^{1/2}}^{2}+\left\lVert(P\_{\lambda}-1)(\boldsymbol{u}\_{\lambda}(t)-\boldsymbol{u}\_{\infty}(t))\right\rVert\_{H^{1/2}}^{2}\right)dt \\ &\leqslant C n^{4}\mathbb{E}\int\_{0}^{T\_{0}}\left(\left\lVert(P\_{\lambda}-1)\boldsymbol{u}\_{\infty}(t)\right\rVert\_{H^{1/2}}^{2}+\left\lVert(\boldsymbol{u}\_{\lambda}(t)-\boldsymbol{u}\_{\infty}(t))\right\rVert\_{H^{1/2}}^{2}\right)dt\to 0 \end{split}$$

as *<sup>λ</sup>* <sup>→</sup> 0 for each fixed *<sup>n</sup>* <sup>∈</sup> <sup>N</sup>. Note that the use of the functional <sup>H</sup>*<sup>n</sup>* instead of <sup>H</sup> is important here. Similarly, we calculate the rest two terms in the Itô formula

$$\begin{split} & \partial\_{\boldsymbol{u}} \mathcal{H}\_{\boldsymbol{n}}(\boldsymbol{u}) \Phi + \frac{1}{2} \mathop{\rm tr} \Big\langle \partial\_{\boldsymbol{u}}^{2} \mathcal{H}(\boldsymbol{u}) (\Phi, \Phi) \big] \\ &= (\theta\_{R} - \theta\_{\boldsymbol{n}}) \int \nolimits^{2} \partial\_{\boldsymbol{x}} K \boldsymbol{u} d\boldsymbol{x} + \theta\_{\boldsymbol{n}} \theta\_{\boldsymbol{R}} \left(\theta\_{\boldsymbol{R}} - 1\right) \sum\_{j} \mathcal{V}\_{j}^{2} \int \nolimits^{2} \boldsymbol{g}^{2}(\boldsymbol{u}) d\boldsymbol{x} \\ & \quad + \frac{\theta\_{\boldsymbol{R}} (\theta\_{\boldsymbol{R}} - 1)}{2} \sum\_{j} \mathcal{V}\_{j}^{2} \int \nolimits \boldsymbol{g}(\boldsymbol{u}) K^{-1} \boldsymbol{g}(\boldsymbol{u}) d\boldsymbol{x} \\ & \quad + \frac{\theta\_{\boldsymbol{n}}}{2} \sum\_{j} \mathcal{V}\_{j}^{2} \int \nolimits \left(\boldsymbol{u}^{2} \partial\_{\boldsymbol{x}}^{2} P\_{\lambda}^{2} \boldsymbol{u} + 2 \boldsymbol{u} (\partial\_{\boldsymbol{x}} P\_{\lambda} \boldsymbol{u})^{2}\right) d\boldsymbol{x} \\ & \quad + \theta\_{\boldsymbol{n}} \theta\_{\boldsymbol{R}} \sum\_{j} \mathcal{V}\_{j}^{2} \Big( 2 \int \nolimits (\partial\_{\boldsymbol{x}} P\_{\lambda} \boldsymbol{u}) \boldsymbol{g}(\boldsymbol{u}) d\boldsymbol{x} - \int \boldsymbol{g}(\boldsymbol{u}) P\_{\lambda} K^{-1} \boldsymbol{g}(\boldsymbol{u}) d\boldsymbol{x} \Big) \Bigg\} \end{split}$$

$$\begin{aligned} &+\frac{1}{3}\theta\_R\theta\_n' \int u^3 d\mathbf{y} \left(\frac{\theta\_R - 1}{2} \sum\_j \nu\_j^2 \int \mathbf{g}(u) K^{-1} \mathbf{g}(u) dx - \int \mathbf{u} \mathbf{g}(u) dx\right), \\ &= J\_1 + \dots + J\_6, \end{aligned}$$

where as above *u* = *uλ*. One can prove that for a.e. *ω* ∈ and *t* ∈ [0*, T*0] the first three terms *J*<sup>1</sup> + *J*<sup>2</sup> + *J*<sup>3</sup> tend to the integrand of the right hand side of Expression (8) in the subsequent limits, firstly, as *λ* → ∞ and then as *n* → ∞. Both *J*<sup>4</sup> and *J*<sup>5</sup> tend to zero as *λ* → ∞. Meanwhile the last term *J*<sup>6</sup> stays bounded by *C/n*, and so lim*n*→∞ lim*λ*→∞ *J*<sup>6</sup> = 0*.* Let us show, for example, that *J*<sup>4</sup> → 0 which is the most troublesome term in the sum, since here is the only place in the paper where we make use of the fact *σ* 1. The rest are treated similarly without this additional restriction. Indeed,

$$J\_4 \leqslant C \left| \int \left( \mu \partial\_{\lambda} P\_{\lambda} \mu - P\_{\lambda} (\mu \partial\_{\lambda} \mu) \right) (P\_{\lambda} - 1) \partial\_{\lambda} \mu dx \right|$$

$$\leqslant C \left\| \mu\_{\lambda} \right\|\_{H^1}^2 \left( \left\| (P\_{\lambda} - 1) \mu\_{\infty} \right\|\_{H^1} + \left\| \mu\_{\lambda} - \mu\_{\infty} \right\|\_{H^1} \right)$$

that obviously tends to zero as *λ* → ∞. This concludes the proof.

At this stage one cannot claim the energy conservation yet, so we will prove a weaker result that will be sharpened later. Note that there exists *C*<sup>H</sup> *>* 0 such that

$$\|\boldsymbol{\mu}\|\_{\mathcal{H}}^2 (1 - C\_{\mathcal{H}} \|\boldsymbol{\mu}\|\_{\mathcal{H}}) \leqslant \mathcal{H}(\boldsymbol{\mu}) \leqslant \|\boldsymbol{\mu}\|\_{\mathcal{H}}^2 (1 + C\_{\mathcal{H}} \|\boldsymbol{\mu}\|\_{\mathcal{H}}),\tag{10}$$

following from the well-known embedding *<sup>H</sup>σ*<sup>0</sup> *(*R*)* <sup>→</sup> *<sup>L</sup>*∞*(*R*),* recall that *<sup>σ</sup>*<sup>0</sup> *<sup>&</sup>gt;* 1*/*2.

**Lemma 1** *There exists a constant T*<sup>1</sup> *>* 0 *independent of ω such that if u solving Eq.* (6) *has u*<sup>H</sup> <sup>1</sup> <sup>2</sup>*C*<sup>H</sup> *on some interval* [0*, τ* ] *then* <sup>H</sup>*(u)* <sup>2</sup>H*(u(*0*)) on* [0*, T*<sup>1</sup> <sup>∧</sup> *τ* ]*.*

*Proof* At first one can notice that as long as *u*<sup>H</sup> stays bounded by *(*2*C*H*)*−1, we have

$$\frac{1}{2} \left\| \boldsymbol{\mu} \right\|\_{\mathcal{H}}^2 \leqslant \mathcal{H}(\boldsymbol{\mu}) \leqslant \frac{3}{2} \left\| \boldsymbol{\mu} \right\|\_{\mathcal{H}}^2.$$

Moreover, one can as well easily deduce from (8) the following bound

$$\mathcal{H}(u(t)) \lesssim \mathcal{H}(u(0)) + C \int\_0^t \mathcal{H}(u(s))ds,$$

and so the proof is concluded by Grönwall's lemma.

#### **3 Proof of the Main Result**

We construct a solution *u* of (5) iteratively on the intervals [0*, T*1]*,*[*T*1*,* 2*T*1] and so on. Here the interval size *T*<sup>1</sup> is defined by Lemma 1. Staying under the assumptions of Theorem 1, we denote by *um* solutions of Eq. (6) with *<sup>R</sup>* <sup>=</sup> *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> given by Proposition 1, where we subsequently set *t*<sup>0</sup> = 0*, T*1*,* 2*T*1*,...*. We define the stopping times

$$\tau\_m = \tau\_m^{t\_0} = \inf\left\{ t \in [t\_0, T\_0] : \|u\_m(t)\|\_{H^\sigma} > m \right\} \tag{11}$$

with the agreement inf ∅ = *T*0. Starting with *t*<sup>0</sup> = 0 we firstly show the following result.

**Lemma 2** *For a.e. <sup>ω</sup>* <sup>∈</sup> *, any <sup>m</sup>* <sup>∈</sup> <sup>N</sup> *and each <sup>t</sup>* ∈ [0*, τ* ] *with τ (ω)* <sup>=</sup> min{*τm(ω), τm*+1*(ω)*}*, it holds true that um(t)* = *um*+1*(t).*

*Proof* We define

$$
\widetilde{\mu}\_{l}(t) = \begin{cases}
 u\_{l}(t) & \text{if } t \in [0, \tau] \\
 \mathcal{S}(t, \tau)u\_{l}(\tau) & \text{if } t \in [\tau, T\_{0}]
\end{cases}, \quad i = m, m+1.
$$

At first we will show that '*um* and '*um*+<sup>1</sup> coincide in *XT* provided *T* is sufficiently small. Then we will finish the proof by an iteration procedure. The difference of these functions has the form

$$\begin{aligned} \widetilde{\boldsymbol{u}}\_{m+1}(t) - \widetilde{\boldsymbol{u}}\_{m}(t) &= \mathcal{S}(t, \boldsymbol{0}) \int\_{0}^{t \wedge \tau} \mathcal{S}(\boldsymbol{0}, \boldsymbol{s}) \left( f(\widetilde{\boldsymbol{u}}\_{m+1}(\boldsymbol{s})) - f(\widetilde{\boldsymbol{u}}\_{m}(\boldsymbol{s})) \right) d\boldsymbol{s} \\ &+ \mathcal{S}(t, \boldsymbol{0}) \sum\_{j} \boldsymbol{\gamma}\_{j} \int\_{0}^{t \wedge \tau} \mathcal{S}(\boldsymbol{0}, \boldsymbol{s}) \left( g(\widetilde{\boldsymbol{u}}\_{m+1}(\boldsymbol{s})) - g(\widetilde{\boldsymbol{u}}\_{m}(\boldsymbol{s})) \right) d\boldsymbol{W}\_{j}(\boldsymbol{s}), \end{aligned}$$

where the stochastic integral is estimated via

E sup 0*t*-*T* & & & & & & *SW (t,* 0*) j γj t* 0 *S(t* − *s)χ*{*s<sup>τ</sup>* }*(s)SW (*0*, s)(g(*'*um*+1*(s))* − *g(*'*um(s))) dWj (s)* & & & & & & 2 *H<sup>σ</sup> C*E - *T* 0 *χ*{*s<sup>τ</sup>* }*(s) SW (*0*, s)(g(*'*um*+1*(s))* <sup>−</sup> *g(*'*um(s)))*<sup>2</sup> *<sup>H</sup><sup>σ</sup> ds C*E - *T* 0 *χ*{*s<sup>τ</sup>* }*(s)(*'*um*+1*(s)* + '*um(s)H<sup>σ</sup> )* <sup>2</sup> '*um*+1*(s)* <sup>−</sup> '*um(s)*<sup>2</sup> *<sup>H</sup><sup>σ</sup> ds C(*2*m* + 1*)* <sup>2</sup>*T* E sup [0*,T* ] '*um*+<sup>1</sup> <sup>−</sup> '*um*<sup>2</sup> *Hσ*

with the help of the Burkholder inequality for convolution with the unitary group *S*, see [5, Lemma 3.3]. The first integral is estimated more straightforwardly, notice a similar argument employed to *I* in the proof of Proposition 1, and so one obtains

$$\|\widetilde{u}\_{m+1} - \widetilde{u}\_m\|\_{X\_T} \lesssim C(m)\sqrt{T} \left\|\widetilde{u}\_{m+1} - \widetilde{u}\_m\right\|\_{X\_T}.$$

Hence '*um*+<sup>1</sup> = '*um* on [0*, T* ] for a.e. *ω* ∈ provided *T* is chosen sufficiently small depending only on *m*. Thus we can iterate this procedure to show that '*um*+<sup>1</sup> = '*um* on the whole interval [0*, T*0], which concludes the proof of the lemma.

Our goal is to bound *umL*2*C(*0*,T*1;*H<sup>σ</sup> )* by a constant independent of *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, and so we will need to estimate *f (um)H<sup>σ</sup> , g(um)H<sup>σ</sup> ,* in particular. This can be easily done with the help of

$$\|\|\phi\psi\|\|\_{H^{\sigma}} \lesssim C(\sigma, \sigma\_0) \left(\|\phi\|\|\_{H^{\sigma}} \|\|\psi\|\|\_{H^{\sigma\_0}} + \|\phi\|\|\_{H^{\sigma\_0}} \|\|\psi\|\|\_{H^{\sigma}}\right)$$

being true for any *σ* 0 and *σ*<sup>0</sup> *>* 1*/*2, see for example [7, Estimate (3.12)].

For a.e. *<sup>ω</sup>* <sup>∈</sup> and any *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>, *<sup>t</sup>* ∈ [0*, T*0] we have

$$\|u\_m(t)\|\_{H^{\sigma}} \leqslant \|u\_0\|\_{H^{\sigma}} + \int\_0^t \|f(u\_m(s))\|\_{H^{\sigma}} \, ds + \left\|\sum\_j \gamma\_j \int\_0^t \mathcal{S}(0,s) g\_m(u\_m(s)) dW\_j(s)\right\|\_{H^{\sigma}},$$

where *f (um(s))H<sup>σ</sup> <sup>C</sup> um(s)Hσ*<sup>0</sup> <sup>+</sup> *um(s)*<sup>2</sup> *Hσ*<sup>0</sup> *um(s)H<sup>σ</sup> .* Now taking into account that S*(*0*, s)gm(um(s))H<sup>σ</sup> C um(s)Hσ*<sup>0</sup> *um(s)H<sup>σ</sup> ,* the stochastic integral can be estimated by the Burkholder inequality, and so we obtain for any 0 *< T T*<sup>0</sup> the following inequality

$$\mathbb{E} \sup\_{t \in [0, T]} \|u\_m(t)\|\_{H^{\sigma}}^2 \leqslant \Im \mathbb{E} \left\|u\_0\right\|\_{H^{\sigma}}^2 + C \mathbb{E} \int\_0^T \left(\left\|u\_m(t)\right\|\_{H^{\sigma\_0}}^2 + \left\|u\_m(t)\right\|\_{H^{\sigma\_0}}^4\right) \|u\_m(t)\|\_{H^{\sigma}}^2 dt,\tag{12}$$

where *C* depends only on *σ*0, *σ*, *T*0, *<sup>j</sup> γ* <sup>2</sup> *<sup>j</sup>* . This inequality we will use iteratively on the intervals [0*, T*<sup>0</sup> <sup>∧</sup> *kT*1], *<sup>k</sup>* <sup>∈</sup> <sup>N</sup>, with *<sup>T</sup>*<sup>1</sup> found in Lemma 1. Let *u*0<sup>H</sup> *(*5*C*H*)*−<sup>1</sup> a.e. on . Consider the following stopping time

$$T\_2^m = \inf \left\{ t \in [0, T\_0] : \|u\_m(t)\|\_{\mathcal{H}} > (2C\_{\mathcal{H}})^{-1} \right\}.$$

Then a.e. *T*<sup>1</sup> *T <sup>m</sup>* <sup>2</sup> . Indeed, assuming the contrary *<sup>T</sup>*<sup>1</sup> *> T <sup>m</sup>* <sup>2</sup> one can deduce from (10) and Lemma 1 that

$$\begin{aligned} \|\mu\_m(T\_2^m)\|\_{\mathcal{H}} &\leqslant \sqrt{2\mathcal{H}(\mu\_m(T\_2^m))} \leqslant 2\sqrt{\mathcal{H}(\mu\_0)} \leqslant 2\sqrt{1+C\_{\mathcal{H}}\,\|\mu\_0\|\_{\mathcal{H}}} \,\|\mu\_0\|\_{\mathcal{H}},\\ &\leqslant \sqrt{\frac{24}{125}}C\_{\mathcal{H}}^{-1} \leqslant \left(2C\mu\right)^{-1},\end{aligned}$$

which contradicts to the definition of the stopping time *T <sup>m</sup>* <sup>2</sup> due to continuity of *um*H. As a result *um*<sup>H</sup> stays bounded by *(*2*C*H*)*−<sup>1</sup> on the interval [0*, T*1] for a.e. *ω*, and this simplifies (12) in the following way

$$\mathbb{E}\sup\_{t\in[0,T]}\left\lVert\mu\_m(t)\right\rVert\_{H^{\sigma}}^2 \leqslant \mathfrak{Z}\mathbb{E}\left\lVert\mu\_0\right\rVert\_{H^{\sigma}}^2 + C\int\_0^T \mathbb{E}\sup\_{s\in[0,t]}\left\lVert\mu\_m(s)\right\rVert\_{H^{\sigma}}^2 \,dt$$

holding true for any 0 *< T T*1. Hence by Grönwall's lemma we obtain

*um*<sup>2</sup> *<sup>L</sup>*2*C(*0*,T*1;*H<sup>σ</sup> )* <sup>3</sup> *u*0<sup>2</sup> *<sup>L</sup>*2*H<sup>σ</sup> <sup>e</sup>CT*<sup>1</sup> <sup>=</sup> *M,*

where *<sup>M</sup>* does not depend on *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>. Hence

$$\mathbb{P}(\tau\_m \ge T\_1) = \mathbb{P}\left(\|u\_m\|\_{C(0,T\_1;H^\sigma)} \lesssim m\right) \geqslant 1 - \frac{1}{m^2} \mathbb{E}\left\|u\_m\right\|\_{C(0,T\_1;H^\sigma)}^2 \geqslant 1 - \frac{M}{m^2},$$

and so [0*, T*1]⊂∪*m*∈N[0*, τm(ω)*] for a.e. *ω* ∈ . Thus we can define *u* on [0*, T*1] by assigning *u* = *um* on [0*, τm*]. This is obviously a solution of (5) on [0*, T*1] satisfying *<sup>d</sup>*H*(u)* <sup>=</sup> 0 and *u*<sup>H</sup> *< (*2*C*H*)*−<sup>1</sup> for a.e. *<sup>ω</sup>* <sup>∈</sup> .

Now one can repeat the argument on [*T*1*,* 2*T*1] by constructing new solutions *um* of Eq. (6) with the initial data *u(T*1*)* given at the time moment *t*<sup>0</sup> = *T*1. The stopping times *τm* are defined by (11) with *<sup>t</sup>*<sup>0</sup> <sup>=</sup> *<sup>T</sup>*1. The fact that *um*<sup>H</sup> does not exceed the level *(*2*C*H*)*−1, is guaranteed by the energy conservation, namely by H*(u(T*1*))* = H*(u*0*)* in the same manner as above. The rest is similar, and so we get a solution on [*T*1*,* 2*T*1] with the constant energy equalled H*(u*0*)*. After several repetitions of the argument we construct a solution on [0*, T*0].

It remains to prove the uniqueness. Let *<sup>u</sup>*1*, u*<sup>2</sup> <sup>∈</sup> *<sup>L</sup>*2*(*;*C(*0*, T*0; *<sup>H</sup><sup>σ</sup> (*R*)))* solve Eq. (5). For *R >* 0 we introduce

$$\tau\_R = \inf \left\{ t \in [0, T\_0] : \max\_{l=1,2} \|u\_l(t)\|\_{H^{\sigma}} > R \right\}.$$

Clearly, for a.e. *ω* ∈ both *u*<sup>1</sup> and *u*<sup>2</sup> are solutions of (6) on [0*, τR*]. By Proposition <sup>1</sup> it holds true that *<sup>u</sup>*<sup>1</sup> <sup>=</sup> *<sup>u</sup>*<sup>2</sup> on [0*, τR*] for a.e. *<sup>ω</sup>* <sup>∈</sup> . Taking *<sup>R</sup>* <sup>∈</sup> <sup>N</sup> and exploiting the time-continuity of *u*1, *u*<sup>2</sup> one obtains *u*<sup>1</sup> = *u*<sup>2</sup> on [0*,* lim*R*→∞ *τR*] for a.e. *ω* ∈ . Now from sub-additivity and Chebyshev's inequality we deduce

$$\begin{aligned} \mathbb{P}(\tau\_R \ge T\_0) &= \mathbb{P}\left(\max\_{l=1,2} \|\mu\_l\|\_{C(0,T\_0;H^\sigma)} \leqslant \mathcal{R}\right) \\ &\geqslant 1 - \frac{1}{\mathcal{R}^2} \mathbb{E}\left(\|\mu\_1\|\_{C(0,T\_0;H^\sigma)}^2 + \|\mu\_2\|\_{C(0,T\_0;H^\sigma)}^2\right) \to 1 \end{aligned}$$

as *R* → ∞, proving *u*<sup>1</sup> = *u*<sup>2</sup> on [0*, T*0]. This concludes the proof of Theorem 1.

**Acknowledgments** The author is grateful to the members of STUOD team for fruitful discussions and numerous helpful comments. The author acknowledges the support of the ERC EU project 856408-STUOD.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Observation-Based Noise Calibration: An Efficient Dynamics for the Ensemble Kalman Filter**

**Benjamin Dufée, Etienne Mémin, and Dan Crisan**

**Abstract** We investigate the calibration of the stochastic noise in order to guide the realizations towards the observational data used for the assimilation. This is done in the context of the stochastic parametrization under Location Uncertainty (LU) and data assimilation. The new methodology is rigorously justified by the use of the Girsanov theorem, and yields significant improvements in the experiments carried out on the Surface Quasi Geostrophic (SQG) model, when applied to Ensemble Kalman filters. The particular test case studied here shows improvements of the peak MSE from 85% to 93%.

**Keywords** Stochastic parametrization · Modeling under location uncertainty · noise calibration · Ensemble Kalman filters · Square root filters

#### **1 Introduction**

Sequential data assimilation uses observational data to correct a set of realizations given by a numerical model. In the case of both high-dimensional data and model, the data assimilation methodology can be facilitated via a procedure allowing to guide the realizations towards the available observations. This is particularly helpful in high dimensions as it enables the ensemble to focus on a restricted set of the state space. That is what we intend to put forward in this paper. This work relies on a stochastic parametrization of the underlying dynamical system based on the Location Uncertainty (LU) principles, which rely on a decomposition of the Lagrangian velocity into a large-scale smooth component and a random timeuncorrelated component. In this setting, a stochastic transport operator plays the

B. Dufée (-) · E. Mémin

D. Crisan

Inria/Irmar, Fluminance, Campus universitaire de Beaulieu, Rennes Cedex, France e-mail: benjamin.dufee@inria.fr; etienne.memin@inria.fr

Department of Mathematics, Imperial College, London, UK e-mail: d.crisan@imperial.ac.uk

role of the usual material derivative, see [1] for more details. This work aims at adding the feature of a noise specifically calibrated to play a guiding role for the realizations.

In a previous data assimilation study on the Surface Quasi Geostrophic (SQG) model, the stochastic forecast was shown to provide better results than deterministic techniques like variance inflation with perturbation on the initial condition, see [2] for details. The current study is a continuation of [2]. The noise calibration presented here further improves the results presented in [2], particularly when the system starts from poor or badly estimated initial conditions (for instance resulting from initial estimations relying on regularized inverse problems). For such initial conditions, which are generally too smooth and inaccurate, classical ensemble methods are likely to be put in difficulties. In this short paper, we will first briefly recall the principles of Location Uncertainty and how it applies to the SQG model. Then we will detail the procedure leading to the noise calibration, and finally detail and assess the numerical experiments performed.

#### **2 The Stochastic SQG Model Under Location Uncertainty (LU)**

The analysis in this paper is carried out on the 2D Surface Quasi-Geostrophic (SQG) model. The SQG equations model an idealized dynamics for surface oceanic currents. It involves many realistic non-linear features such as fronts or strong multiscale eddies (see [3, 4] for details). The deterministic SQG model couples a transport equation of the buoyancy field *b*, a kinematic condition and a 2D divergence-free constraint:

$$\mathbf{D}\_l b = 0 \; ; \; b = \frac{N\_{start}}{f\_0} (-\Delta)^{\frac{1}{2}} \boldsymbol{\psi} \; ; \; \; \boldsymbol{v} = \nabla^{\perp} \boldsymbol{\psi} , \tag{1}$$

expressed on *ψ* the stream function and *v* the velocity, where D*<sup>t</sup>* is the material derivative. The kinematic condition depends on the stratification *Nstrat* and the Coriolis frequency *f*0.

The corresponding stochastic dynamics is derived from the Location Uncertainty (LU) principles described in [1]. The full description and numerical analysis of the LU-SQG model can be found in [5, 6]. This stochastic formalism models the impact of the small scales on the flow component that is initially smooth in time. It relies on the decomposition of the Lagrangian velocity of a fluid particle positioned at *xt* in a spatial domain *<sup>Ω</sup>* <sup>⊂</sup> <sup>R</sup>2:

$$\mathbf{dx}\_{l} = \upsilon(\mathbf{x}\_{l}, t)\mathbf{d}t + \sigma(\mathbf{x}\_{l}, t)\mathbf{d}B\_{l},\tag{2}$$

in terms of a resolved component *v* (referred to as the large-scale component in the following) and *σ*d*Bt* , an unresolved highly oscillating random component, built from a (cylindrical) Wiener process *Bt* (ie a well-defined Brownian motion taking values in a functional space) [7]. The increments of the latter component are timeindependent. Due to the lack of smoothness of the solution *xt* , we rigorously derive (2) in its integral form.

The random perturbation of velocity is Gaussian and has the following distribution:

$$
\sigma \text{d}B\_l \sim \mathcal{N}(0, \mathcal{Q} \text{d}t), \tag{3}
$$

where *Q* is the covariance operator. This operator admits an orthonormal eigenfunction basis {*φn(*·*,t)*}*n*∈<sup>N</sup> with non-negative eigenvalues *(λn(t))n*∈N. This generates a convenient spectral definition of the noise as

$$\sigma(\mathbf{x},t)\mathbf{d}B\_{l} = \sum\_{n \in \mathcal{N}} \sqrt{\lambda\_{n}(t)} \phi\_{n}(\mathbf{x},t)\mathbf{d}\boldsymbol{\beta}\_{l}^{n},\tag{4}$$

where the *β<sup>n</sup>* are i.i.d standard one dimensional Brownian motions. From Eq. (4), the noise variance tensor *a* is then defined by

$$a(\mathbf{x}, t) = \sum\_{n \in \mathcal{N}} \lambda\_n(t) \phi\_n(\mathbf{x}, t) \phi\_n(\mathbf{x}, t)^T. \tag{5}$$

It can be noticed the variance tensor has the physical dimension of a viscosity (ie *<sup>m</sup>*2*/s*). Indeed, as *<sup>σ</sup>*d*Bt* is a distance, then *a(x, t)dt* <sup>=</sup> <sup>E</sup>[*σ*d*Bt(σ*d*Bt)<sup>T</sup>* ] is a squared distance. The procedure used to generate the orthonormal basis functions determines the spatial structure of the noise. The one used in our experiments will be presented later in this section.

While a deterministically transported tracer *Θ* has zero material derivative: D*tΘ* = *∂tΘ* + *v* **·** ∇*Θ* = 0, in the LU framework, a stochastically transported tracer cancels a related stochastic transport operator defined as:

$$\mathbf{D}\_l \Theta := \mathbf{d}\_l \Theta + (v^\* \mathbf{d}t + \sigma \mathbf{d}B\_l) \cdot \nabla \Theta - \frac{1}{2} \nabla \cdot (a \nabla \Theta) \mathbf{d}t,\tag{6}$$

where

$$\mathbf{d}\_l \Theta := \Theta(\mathbf{x}, t + \mathbf{d}t) - \Theta(\mathbf{x}, t) \tag{7}$$

is the infinitesimal forward time increment of the tracer. The effective advection velocity is defined by

$$
v^\* = v - \frac{1}{2}\nabla \cdot a,\tag{8}$$

the term *σ*d*Bt* **·** ∇*Θ* is a non-Gaussian multiplicative noise corresponding to the tracer's transport by the small-scale flow, and the last term in (6) is a diffusion term, as the variance tensor *a* is definite positive. The expression of the stochastic transport operator comes from a generalized Itô formula (Itô-Wentzell formula), see [5] for more details.

The stochastic version of the SQG model is obtained by replacing the material derivative D*tb* in Eq. (1) with the stochastic transport operator **D***tb*:

$$\mathbf{D}\_l b = \mathbf{d}\_l b + (v^\* \mathbf{d} t + \sigma \mathbf{d} B\_l) \cdot \nabla b - \frac{1}{2} \nabla \cdot (a \nabla b) \mathbf{d} t = 0,\tag{9}$$

and an additional compressibility constraint on the noise:

$$\nabla \cdot \sigma \mathbf{d} B\_l = 0.\tag{10}$$

In the case of a compressible random field, the modified advection incorporates an additional term in Eq. (8) related to the noise divergence [5]. One essential property of LU (for a divergence-free noise component) is the conservation of energy for the transported random tracer, under the same ideal boundary conditions as in the deterministic case:

$$\mathrm{d} \int\_{\varOmega} \Theta^2(\mathbf{x}) \mathrm{d}x = 0,\tag{11}$$

and, very importantly, this energy conservation property holds pathwise (i.e for any realization of the Brownian noise), see [5, 8] for details. This property highlights the strong relation between the LU-SQG version and the deterministic one.

**Noise Generation** The method used to generate the noise in this study relies on a data-driven method called proper orthogonal decomposition (POD) to estimate the empirical orthogonal functions in the spectral representation of Eq. (4). By a slight abuse of notation in the following, this noise will be referred to as POD noise. We give some brief details in what follows.

Considering a series of snapshots of the velocity field, this method consists in the computation of the covariance tensor around the temporal mean of the series of snapshots. Then its eigenvectors and eigenfunctions can be estimated in order to reconstruct the large-scale variability (the first"modes" or eigenfunctions), and the small-scale one (the smaller modes). In practice, this procedure is applied to coarsegrained high-resolution snapshots of deterministic simulations. The latter modes will be the ones on which the noise is decomposed. These modes are divergencefree and stationary by construction, so the global structure of the noise will not vary in time. In case of chaotic geophysical models like this one, we can also use onlinecomputed noises as the one used in our previous work [2] which have much better uncertainty quantification, but are also much more expensive. An extension of this work to this noise is currently at work. We refer to [6] for a precise description of this procedure.

#### **3 Girsanov Theorem and Noise Calibration**

#### *3.1 Change of Measure*

Ensemble-based sequential data assimilation filters are composed of a forecasting step of the ensemble to provide a sampling of the forecast distribution, and an analysis step correcting the departure from the observations. The purpose of the proposed noise calibration is to modify the forecast distribution, taking into account the upcoming observation, in order to guide the forecast towards it. In the context of transport equations such as in the SQG model, this extra guiding term is an added drift in the noise *σ*d*Bt* , which was initially built to have zero mean. Allowing *σ*d*Bt* to have a non-zero mean entails a modification of the transport equation in order to rewrite it in terms of a centered noise. This is called the Girsanov transform, and it consists in a change of underlying measure so that a non-centered noise becomes centered under a new probability measure, up to a drift term accounting for this change of measure. For now, *σ*d*Bt* is defined on a probability space *(,* F*,* P*)* and we define *(*F*t)t* the filtration adapted to *σ*d*Bt* .

The Girsanov theorem (see [7] for details) states that if *(Yt)*<sup>0</sup>≤*t*≤*<sup>T</sup>* is a stochastic process such that:


$$\int\_{0}^{T} Y\_{t}^{2} \mathrm{d}t < \infty.$$

– The process *(Zt)*<sup>0</sup>≤*t*≤*<sup>T</sup>* defined by

$$Z\_l = \exp\left(\int\_0^l Y\_s \mathrm{d}B\_s - \frac{1}{2} \int\_0^l Y\_s^2 \mathrm{d}s\right) \tag{12}$$

is a F*t*-martingale,

then there exists a probability measure P˜ under which:

– The process *(B*˜*t)*<sup>0</sup>≤*t*≤*<sup>T</sup>* defined by

$$
\tilde{B}\_l = B\_l - \int\_0^l Y\_s \mathbf{ds} \tag{13}
$$

is a standard cylindrical Wiener process.

– The Radon-Nikodym derivative of P˜ with respect to P is *ZT* .

Let us denote by *(Γt)*<sup>0</sup>≤*t*≤*<sup>T</sup>* the drift we intend to add to the noise. With such a change of measure, let us see how Eq. (9) is modified. According to Eq. (13), we have

$$\mathbf{d}B\_{l} = \mathbf{d}\bar{B}\_{l} + \Gamma\_{l}\mathbf{d}t,\tag{14}$$

so the stochastic transport operator rewrites

$$\mathbf{D}\_l b = \mathbf{d}\_l b + (v^\* \mathbf{d} t + \sigma \mathbf{[d} \tilde{B}\_l + \varGamma\_l \mathbf{d} t]) \cdot \nabla b - \frac{1}{2} \nabla \cdot (a \nabla b) \mathbf{d} t \tag{15a}$$

$$\mathbf{d} = \mathbf{d}\_l b + (v^\* \mathbf{d}t + v\_I \mathbf{d}t + \sigma \mathbf{d} \tilde{B}\_l) \cdot \nabla b - \frac{1}{2} \nabla \cdot (a \nabla b) \mathbf{d}t,\tag{15b}$$

where

$$\upsilon\_{\varGamma} = \sum\_{k=1}^{K} \chi\_k \phi\_k \tag{16}$$

is the velocity drift entailed by the Girsanov transform and we assume that *Γt* = *Γ* = *(γ*1*,...,γK)* is constant on a small time step d*t*, which will be the case for the discretized numerical scheme that we use.

As a result, under the probability measure P˜, (15) presents the same form as Eq. (9) since *B*˜ is indeed a centered cylindrical Wiener process under P˜, but with an added drifted advection velocity.

#### *3.2 Computation of the Girsanov Drift*

We now describe how to compute *Γ* in order to guide the forecast towards the next observation.

Let us start from a given time *t*<sup>1</sup> where a complete buoyancy and velocity field is available. The next observation *<sup>b</sup>obs(*·*, t*2*)* is assumed to be available at time *<sup>t</sup>*<sup>2</sup> and *L* numerical time steps are performed until then (*t*<sup>2</sup> − *t*<sup>1</sup> = *Lδt* , where *δt* is the time discretization step).

At time *t*1, a rough prediction of the velocity at time *t*<sup>2</sup> can be estimated with the current velocity (which, more precisely, comes from previous stochastic iterations, but is F*t*<sup>1</sup> -measurable), namely

$$b^{obs}\left(\mathbf{x} + \upsilon(\mathbf{x}, t\_1)L\delta\_t, t\_2\right) := \tilde{b}(\mathbf{x}, t\_2),\tag{17}$$

that stands for the backward-registered observation with respect to the current deterministic velocity. This way the error made is

$$
\Delta\_l \bar{b}(\mathbf{x}) = \bar{b}(\mathbf{x}, t\_2) - b(\mathbf{x}, t\_l). \tag{18}
$$

So *b(x, t* ˜ <sup>2</sup>*)* is a value taken in a modified observation field, because *bobs* is advected by the current velocity *v(*·*, t*1*)*. For this reason we consider that the backwardregistered observation used for the calibration does not have the same nature as the raw observation used for data assimilation. It constitutes a pseudo-observation, for which we can consider that the error due to the imprecision of the backwardregistration (ensuing in particular from successive bilinear interpolations) is way bigger than the observation noise, and almost uncorrelated to the latter. In the second case, only the raw observation is used for the Kalman filter, corresponding only to the observation noise. The aim is now to calibrate the current velocity by adding a Girsanov drift *vΓ* <sup>=</sup> *<sup>K</sup> <sup>k</sup>*=<sup>1</sup> *γkφk*, such that the solution of the following transport equation

$$b\left(\mathbf{x} + \boldsymbol{v}(\mathbf{x}, t\_{\mathrm{l}})L\delta\_{\mathrm{l}} + \boldsymbol{v}\_{\mathrm{I}}L\delta\_{\mathrm{l}} + \sum\_{k=1}^{K} (\sqrt{\delta\_{\mathrm{l}}}\phi\_{k})(\sqrt{L\delta\_{\mathrm{l}}}\beta\_{k}), t\_{\mathrm{2}}\right) = b(\mathbf{x}, t\_{\mathrm{l}}).\tag{19}$$

is approximated in a least square sense. In other words, we solve the following minimization problem:

$$\begin{split} \min\_{\Gamma} \int\_{\mathcal{Q}} \mathrm{E} \Big[ b \Big( \mathbf{x} + v(\mathbf{x}, t\_{1}) L \delta\_{l} + v\_{\Gamma} L \delta\_{l} \\ \qquad + \sum\_{k=1}^{K} (\sqrt{\delta\_{l}} \phi\_{k}) (\sqrt{L \delta\_{l}} \beta\_{k}), t\_{2} \Big) - b(\mathbf{x}, t\_{1}) \Big]^{2} \mathrm{d} \mathbf{x} . \qquad (20) \end{split} \tag{20}$$

This can be rewritten as

$$\min\_{\Gamma} \int\_{\Omega} \left[ \Delta\_l \tilde{b} + \nabla \tilde{b} \cdot v\_{\Gamma} L \delta\_l - \frac{1}{2} \nabla \tilde{b} \cdot \nabla a L \delta\_l - \frac{1}{2} \nabla \cdot (a \nabla \tilde{b}) L \delta\_l \right]^2 \, \mathrm{d}x \dots$$

Using the identities

$$\nabla \cdot a = \sum\_{k=1}^{K} (\phi\_k \cdot \nabla)\phi\_k \; ; \; \nabla \cdot (a \nabla b) = \sum\_{k=1}^{K} (\phi\_k \cdot \nabla)(\phi\_k \cdot \nabla b), \tag{21}$$

we rewrite the minimization problem as

$$\min\_{\Gamma} \int\_{\Omega} \left[ \Delta\_l \tilde{b} + \nabla \tilde{b} \cdot \left( \sum\_{k=1}^{K} \chi\_k \phi\_k \right) L \delta\_l - \frac{1}{2} \sum\_{k=1}^{K} (\nabla \tilde{b} \cdot F\_k + G\_k(\tilde{b})) L \delta\_l \right]^2 dx \tag{22}$$

where

$$F\_k = (\phi\_k \cdot \nabla)\phi\_k \; ; \;\; G\_k(\bar{b}) = (\phi\_k \cdot \nabla)(\phi\_k \cdot \nabla \bar{b}) .$$

Denoting by *J* the integrand, we have

$$\begin{split} \frac{\partial J}{\partial \boldsymbol{\gamma}\_{l}} = 2 \int\_{\Omega} (\nabla \tilde{\boldsymbol{b}} \cdot \boldsymbol{\phi}\_{l}) L \boldsymbol{\delta}\_{l} \Bigg[ \Delta\_{l} \tilde{\boldsymbol{b}} + \nabla \tilde{\boldsymbol{b}} \cdot \left( \sum\_{k=1}^{K} \boldsymbol{\chi}\_{k} \boldsymbol{\phi}\_{k} \right) L \boldsymbol{\delta}\_{l} \\ & - \frac{1}{2} \sum\_{k=1}^{K} (\nabla \tilde{\boldsymbol{b}} \cdot \boldsymbol{F}\_{k} + G\_{k}(\tilde{\boldsymbol{b}})) L \boldsymbol{\delta}\_{l} \Bigg] \mathrm{d} \mathbf{x}. \end{split} \tag{23}$$

Finally, we add a regularization term *<sup>α</sup>*||*vΓ* ||<sup>2</sup> <sup>2</sup> <sup>=</sup> *<sup>α</sup> <sup>K</sup> <sup>k</sup>*=<sup>1</sup> *<sup>γ</sup>* <sup>2</sup> *<sup>k</sup> λk*, where *λk* is the eigenvalue of the *Q*-eigenfunction *φk* in Eq. (22) to ensure the uniqueness of the solution of the proposed minimization problem, where *α* needs to be tuned properly. As a result, the minimization problem can be written as an inverse problem

$$A\varGamma = c \tag{24}$$

where

$$A\_{ik} := 2 \int\_{\varOmega} (\nabla \tilde{b} \cdot \phi\_l)(\nabla \tilde{b} \cdot \phi\_k) + 2\alpha \lambda\_k \delta\_{lk} \tag{25a}$$

$$c\_l := \int\_{\varOmega} (\nabla \tilde{b} \cdot \phi\_l) \left[ 2\Delta\_l \tilde{b} - \sum\_{k=1}^{K} (\nabla \tilde{b} \cdot F\_k + G\_k(\tilde{b})) \right] d\mathbf{x}. \tag{25b}$$

The parameter *α* is a priori fixed in order to control the resulting euclidian norm of *vΓ* , ||*vΓ* ||2. Large values of *α* lead to very small corrections (*Γ* tends to *(*0*,...,* 0*)* when *α* goes to +∞) whereas small values yield very strong and noisy drifts, as we get closer to an ill-posed problem. For now, we use an empirical iterative way to tune *α*, we increase it until the resulting norm of *vΓ* is under a given threshold.

#### **4 Experiments**

This section details the numerical experiments carried out in this work. The goal is to study the benefits brought by a noise-calibrated forecast in an up-to-date version of a localized ensemble Kalman filter. In particular we wish to observe whether or not the noise calibration brings by itself an efficient and practical improvement of the assimilation step.

Ensemble Kalman filters (see e.g. [9] for details) constitute a well-known family of data assimilation methods. They rely on an ensemble of realizations (called ensemble members) of a dynamical system *(x<sup>f</sup> <sup>n</sup> )n*=1*,...,N* coming from the forecast step, and give as an output another set of members *(x<sup>a</sup> <sup>n</sup> )n*=1*,...,N* . Each posterior ensemble member *x<sup>a</sup> <sup>n</sup>* is obtained as a linear combination of the prior ensemble members *(x<sup>f</sup> <sup>n</sup> )n*=1*,...,N* in order to minimize the distance between the ensemble and the observation in some sense.

One important assumption of the classical EnKF is to consider that the observation and model noise are uncorrelated. This observation-calibrated forecast could imply that the latter assumption no longer holds. Still, the discussion following Eq. (18) on the observation nature explains why we can consider the uncorrelation between the forecast and observation noise. If this assumption appears to be not valid, we refer to the work made in [10] to rigorously justify the introduction of an observation-dependent forecast. In this work, both Kalman and particle filter equations were rewritten in terms of the conditional expectation with respect to the underlying sequence of current and past observations. The stochastic simulations are run on a double-periodic simulation grid, *Gs*, of size 64 × 64 points and of physical size 1000 km × 1000 km, meaning that two neighbor points are approximately 15 km apart. An observation is assumed to be available every day (i.e. every 600 time steps of the dynamics) on a coarser observation grid, *Go*, which is a subset of *Gs* of size 16 × 16. It is generated as follows: a trajectory of buoyancy *(zt)t* is run from the deterministic model (PDE) at a very fine resolution grid *Gf* , of size 512 × 512. Then a convolution-decimation procedure *D* is applied in order to fit to the targeted simulation grid *Gs*. It consists in the composition of a Gaussian filter and a decimation operator subsampling one pixel out of two. It has to be iterated three times in our case to fit the correct resolution. This is done in order to respect Shannon's theorem and to avoid spectrum folding. A projection operator *P* is applied from *Gs* to *Go*, and we finally add an observation noise to get the observation

$$h^{obs}(\cdot, t) = P \diamond D(z\_l) + \eta\_l \; ; \; \eta\_l \sim \mathcal{N}(\mathbf{0}, R) \; \text{ and } \; R = r^2 I\_M,\tag{26}$$

where *R* is the diagonal observation covariance matrix and *M* is the number of points on the observation grid.

**Numerical Setup** The simulations have been performed with a pseudo-spectral code in space (see [6] for details). The time-scheme is a fourth-order Runge-Kutta scheme for the deterministic PDE, and an Euler-Maruyama scheme for the SPDEs. We use a standard hyperviscosity model to dissipate the energy at the resolution cut-off with a hyperviscosity coefficient *<sup>β</sup>* <sup>=</sup> *(*<sup>5</sup> <sup>×</sup> 1029 <sup>m</sup>8*.*s−1*)M*−<sup>8</sup> *<sup>x</sup>* , where *Mx* is the grid resolution [6].

The test case considered in this study is the following: an ensemble of *N* = 100 ensemble members is started from the very same initial condition at day 0, which consists in two cold vortices to the north and two warm vortices to the south. However, the amplitude of the initial vortices is underestimated compared to the initial condition used for the deterministic run (considered as the truth) by 20%, as shown in Fig. 1. We refer to [2] for a mathematical expression of this field.

In this experiment, we study the differences of efficiency of the localized Ensemble Square Root Filter (an up-to-date version of the Ensemble Kalman filter, see for instance [11] for details of the square root filters (ESRF) and [12] for a description of the observation covariance localization procedure) with both noisecalibrated forecast and classical stochastic simulations. We also refer to [13] for the

**Fig. 1** Initial conditions for the truth (on the left) and for each stochastic run (on the right, common to all ensemble members). We enforce an underestimation of the amplitude of the initial vortices of 20%

extension of the square root filter for additive forecast noise based on covariance transformation, where the advantages of additional model error in the forecast step are shown.

In both cases, starting from the underestimated initial condition, the stochastic dynamics is simulated using the POD noise with *K* = 10 modes. An observation is provided each day (i.e. every 600 time steps of the SPDE), with an observation error covariance set to *<sup>r</sup>* <sup>=</sup> <sup>10</sup>−<sup>5</sup> in (26), which corresponds to a weak (but not negligible, 1% of the maximum amplitude in the initial buoyancy field) noise on the observation. The localization radius is set to *lobs* here, where *lobs* 60 km denotes the distance between two neighboring observational sites, as it provided the best results for both cases.

The typical behaviour of the vortices, at least at the beginning of the simulation, is to spin with no translation of the cores. In our case, the true vortices will spin much faster than those in the biased stochastic runs. The goal of calibration is then to speed these vortices up in order to get them closer to the truth.

The forecast is calibrated at each time step of the SPDE, using the upcoming observation to do it. Multiple parameters were tried for the regularization parameter *α*, or alternatively for the upper bound allowed for the *L*2-norm of the Girsanov drift *vΓ* . Figure 2 compares the MSE along time for all the range of parameters tested here, with also the same experiment without noise calibration. For this latter, the LESRF has a difficult task, as it tries to find linear combinations of the prior ensemble members, which all have an underestimated velocity, to get closer to the observation. This is a general issue for ensemble methods (as well as for particle filters), which are not able and designed to correct the bias if this correction is not made in the forecast. By contrast, the LU calibration offers an additional degree of freedom to guide the ensemble towards the observation. This procedure significantly improves the results in terms of MSE. At day 13, when the MSE is maximal for the usual case, we observe an improvement from 85% to 93% depending on the

**Fig. 2** Comparison of MSE along time between the non calibrated forecast (in black) and all the different parameters tested here for the noise calibration. The snapshots shown in Fig. 3 are taken at day 15 (black dashed line)

parameters tested. The case of the underestimation is an example, but we expect this procedure to be efficient in any situation in which all ensemble members have a similar problem of bias, bad amplitude estimation, artefacts, unsymmetrical features, etc. With a reasonably small ensemble size, which is generally the case in practice, this is likely to occur if the initial conditions have such features.

As explained previously, the regularization term *α* controls the amplitude of the allowed correction drift. In our experiments, all parameters tested yield significant improvements compared to the classical case, still a good trade-off seems to be found with a control of ||*vΓ* ||<sup>2</sup> between 70 and 150. Starting from 150, we observe higher MSE in the very first days, certainly due to a lack of constraint on the inverse problem. In addition to the MSE results, we show in Fig. 3 a more visual example of what calibration does. At day 15, the configuration of the truth is that all four vortices are horizontal. Without calibration (first row), the vortices are slanted because of the initial underestimation of the velocity. The velocity field has not been properly corrected. On the other hand, the LU calibration offers a more reliable prediction, as we recovered the global shape of the vortices, with additional spread around the mean.

Finally, we show in Fig. 4 an insight of how the Girsanov correction *vΓ* behaves in time. As the structure of the noise is stationary, so is the structure of *vΓ* because it relies on the same modes as the noise. What is interesting is the evolution of the amplitude of this field, which decreases in time, meaning that most of the calibration

**Fig. 3** Comparison between the ensemble mean (left) and the ensemble standard deviation (right) maps, with and without calibration, at day 15 with the high-resolution truth

work is done in the very first days of simulation, and once the forecast manages to get closer to the truth, the need for calibration is less crucial and the Girsanov correction gets weaker.

**Fig. 4** Vorticity of the Girsanov drift *vΓ* computed for one ensemble member at the first time step after the initial condition (left) and at the first time step after day 17 (right)

#### **5 Conclusion**

The findings of this paper show the ability of a data-driven noise calibration procedure to improve significantly the assimilation by EnKF of a system initialized with an underestimated initial condition.

As already mentioned in Sect. 2, we intend to extend this setting to non-stationary noises, as they were shown to be associated to a better quantification of the uncertainty (see [6] for details). Regarding computational effort, the calibration procedure is intrinsically paralellizable ensemble-wise, and the techniques used are close to optical flow estimation procedures, for which efficient solutions exist. The tuning step of *α* is the more expensive step for now, for which more sophisticated methods could be envisaged.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Two-Step Numerical Scheme in Time for Surface Quasi Geostrophic Equations Under Location Uncertainty**

#### **Camilla Fiorini, Pierre-Marie Boulvard, Long Li, and Etienne Mémin**

**Abstract** In this work we consider the surface quasi-geostrophic (SQG) system under location uncertainty (LU) and propose a Milstein-type scheme for these equations, which is then used in a multi-step method. The SQG system considered here consists of one stochastic partial differential equation, which models the stochastic transport of the buoyancy, and a linear operator linking the velocity and the buoyancy. In the LU setting, the Euler-Maruyama scheme converges with weak order 1 and strong order 0.5. Our aim is to develop higher order schemes in time, based on a Milstein-type scheme in a multi-step framework. First we compared different kinds of Milstein schemes. The scheme with the best performance is then included in the two-step scheme. Finally, we show how our two-step scheme decreases the error in comparison to other multi-step schemes.

#### **1 Introduction**

The main aim of the modelling under location uncertainty (LU) consists in simulating on coarse meshes an enriched system mimicking a high resolution deterministic chaotic dynamics. Such LU models allow one to recover phenomena such as backscattering, dissipation and reorganisation on very coarse meshes. Furthermore, it provides a natural framework for uncertainty quantification analysis [14]. The LU framework, first introduced in [11], is based on the decomposition of the Lagrangian velocity into two components: a large-scale smooth component and

C. Fiorini (-)

P.-M. Boulvard Inria Paris, Equipe ANGE, Paris, France

L. Li · E. Mémin Inria Rennes - Bretagne Atlantique, Équipe FLUMINANCE, Rennes, France

Laboratoire M2N, Conservatoire National des Arts et Métiers, Paris, France e-mail: camilla.fiorini@lecnam.net

Inria Rennes - Bretagne Atlantique, Équipe FLUMINANCE, Rennes, France

a small-scale fast oscillating one. This decomposition leads to a stochastic transport operator, and one can, in turn, develop the stochastic version of classical fluiddynamics systems derived from the Navier–Stokes equations. SQG in particular consists of one stochastic partial differential equation (SPDE), which models the stochastic transport of the buoyancy, and a linear operator relating the velocity and the buoyancy:

$$\begin{cases} \mathrm{d}b\_{l} = \frac{1}{2} \nabla \cdot (\boldsymbol{a} \nabla b\_{l}) \mathrm{d}t - \boldsymbol{v}^{\*} \cdot \nabla b\_{l} \mathrm{d}t - \nabla b\_{l} \cdot \boldsymbol{\sigma} \mathrm{d}\mathbf{B}\_{l}, \\ b\_{l} = N(-\Delta)^{1/2} \boldsymbol{\psi}, \\ \mathbf{u} = \nabla^{\perp} \boldsymbol{\psi}, \end{cases} \tag{1}$$

where *bt* is the buoyancy at time *t*, **u** the large-scale smooth velocity, *N* a constant depending on the vertical oscillation frequency of the buoyancy and a Coriolis parameter, **<sup>B</sup>** a Wiener process, *<sup>ψ</sup>* the stream function and *<sup>v</sup>*<sup>∗</sup> <sup>=</sup> *<sup>u</sup>*<sup>−</sup> <sup>1</sup> <sup>2</sup>**∇ ·***a* +*σ***∇ ·***σ* is a corrected velocity associated with the effect of the noise inhomogeneity on the advected variables. The spatial correlations of the noise are given through an integral kernel operator *σ* (here assumed deterministic and symmetric for sake of simplicity), and the variance matrix, *a*, given by the matrix kernel of the operator *σ σ* provides a local measure of the noise strength. For more details on the derivation of this system, see [10, 13]. In the rest of this work we will mainly focus on the first equation, and the last two will be condensed in **u** = *H(b)*. Concerning the modelling of the noise, we use the equivalent convenient spectral definition:

$$
\sigma \, \mathrm{d} \mathbf{B}\_l = \sum\_m \varphi^m \, \mathrm{d}\beta\_l^m,
$$

where *<sup>β</sup><sup>m</sup>* <sup>=</sup> *<sup>β</sup>m(t)* are independent one-dimensional standard Brownian motions and *<sup>ϕ</sup><sup>m</sup>* = [*ϕ<sup>m</sup> <sup>x</sup> , ϕ<sup>m</sup> y* ] *<sup>T</sup> (***x***)* are basis functions. The number of terms involved in the sum is in theory infinite, but in numerical application a truncation is considered. In the definition of the numerical schemes we will thus assume that it is a finite sum. For the computation of the basis functions, two strategies are possible: an offline strategy, where they are defined from the eigenfunctions of an empirical covariance tensor built from high-resolution data as described in [10, 13]; of strategies, where the functions are updated during the simulation and in this case they are a function of the buoyancy *b*. With this representation, the variance tensor reads:

$$a = \sum\_{m} \boldsymbol{\varphi}^{m} (\boldsymbol{\varphi}^{m})^{T}.$$

#### **2 Numerical Schemes**

In this section we derive a two-step numerical scheme in time for the SQG system under LU (SQG-LU). We compare this scheme to other multi-step schemes for the SPDE, in particular the ones developed in [5] and [4], and show how our scheme improves the precision. Concerning discretisation in space, standard spectral methods are used: the linear terms are treated in the Fourier space, whilst the nonlinear terms are discretised in the physical space.

The derivation of the time scheme consists of two steps: first, we derive a class of Milstein schemes for SQG-LU and we empirically verify their convergence, then a two-step scheme is proposed.

#### *2.1 Derivation of a Milstein Scheme*

To design the Milstein schemes, we consider the integral form of the SPDE in (1), namely

$$b\_l = b\_{l\_0} + \int\_{l\_0}^{l} \left(\frac{1}{2}\nabla \cdot (\mathbf{a}\nabla b\_s) - \mathbf{v}^\* \cdot \nabla b\_s\right) \mathrm{d}s - \int\_{l\_0}^{l} \sum\_m \nabla b\_s \cdot \boldsymbol{\varphi}^m \mathrm{d}\boldsymbol{\beta}\_s^m,\tag{2}$$

and we can define the following functions:

$$f(b\_l, t) = \frac{1}{2} \nabla \cdot (a \nabla b\_l) - \mathfrak{v}^\* \cdot \nabla b\_l \qquad \text{and} \qquad \mathfrak{g}^m(b\_l, t) = -\nabla b\_l \cdot \mathfrak{g}^m. \tag{3}$$

We can now use the functional extension of the Itô formula [3] for both *f* and *g* to write their differential forms:

$$\begin{aligned} f(b\_l, t) = f(b\_{l\_0}, t\_0) + \int\_{t\_0}^t \frac{\partial f}{\partial s}(b\_s, s) \mathrm{d}s + \int\_{t\_0}^t \frac{\partial f}{\partial b}(b\_s, s) \mathrm{d}b\_s \\ + \frac{1}{2} \int\_{t\_0}^t \frac{\partial^2 f}{\partial b^2}(b\_s, s) \mathrm{d}\langle b, b \rangle\_s \end{aligned} \tag{4}$$

$$\begin{split} \mathbf{g}^{m}(b\_{l},t) = \mathbf{g}^{m}(b\_{l\_{0}},t\_{0}) + \int\_{t\_{0}}^{t} \frac{\partial \mathbf{g}^{m}}{\partial s}^{m}(b\_{s},s) \mathbf{ds} + \int\_{t\_{0}}^{t} \frac{\partial \mathbf{g}^{m}}{\partial b}^{m}(b\_{s},s) \mathbf{d}b\_{s} \\ + \frac{1}{2} \int\_{t\_{0}}^{t} \frac{\partial^{2} \mathbf{g}^{m}}{\partial b^{2}}^{m}(b\_{s},s) \mathbf{d}\langle b,b\rangle\_{s} \end{split} \tag{5}$$

We remark that, since the basis *ϕ<sup>m</sup>* is constant in time then so is *a* and the functions *<sup>f</sup>* and *<sup>g</sup><sup>m</sup>* do not depend explicitly on time, therefore *∂f/∂t* <sup>=</sup> *∂gm/∂t* <sup>=</sup> 0.

Concerning the first derivatives with respect to *b*, it has to be interpreted as a Fréchet derivative. The Fréchet derivative of an operator *F* is the bounded linear operator *DF (x)* which satisfies the following relation:

$$\lim\_{\|h\|\to 0} \frac{\|F(\overline{x} + h) - F(\overline{x}) - DF(\overline{x})h\|}{\|h\|} = 0,\tag{6}$$

which implies that for a linear operator *DF (x)h* = *F (h)*. We start for *g* and use the fact that **∇** is a linear operator:

$$\frac{\partial g}{\partial b}(\overline{b})b = -\nabla b \cdot \boldsymbol{\varphi}^m - \nabla b \cdot \frac{\partial \boldsymbol{\varphi}^m}{\partial b} \;. \tag{7}$$

If the basis is computed offline, *ϕ<sup>m</sup>* does not depend on *b* and therefore the second term in (7) is zero. If the basis is computed online and *ϕ<sup>m</sup>* does depend on *b*, we can rewrite the second term of the sum by components and, using the chain rule, one has:

$$\nabla b \cdot \frac{\partial \boldsymbol{\varphi}^m}{\partial b}^m = \frac{\partial b}{\partial \boldsymbol{x}} \frac{\partial \boldsymbol{\varphi}\_{\boldsymbol{x}}^m}{\partial b} + \frac{\partial b}{\partial \boldsymbol{y}} \frac{\partial \boldsymbol{\varphi}\_{\boldsymbol{y}}^m}{\partial b} = \nabla \cdot \boldsymbol{\varphi}^m. \tag{8}$$

For the second term of *f* , i.e. *v*<sup>∗</sup> **· ∇***b*, the same considerations are valid. To compute the derivative of the first term of *f* , we remark that it is a composition and product of three operators, two of which are linear. We can define:

$$F\_1(\hbar) = \frac{1}{2}\nabla \cdot \hbar, \quad F\_2(b) = \mathfrak{a}(b), \quad F\_3(b) = \nabla b. \tag{9}$$

Using the chain rule and the linearity of *F*<sup>1</sup> and *F*<sup>3</sup> one has:

$$D\left(F\_1(F\_2(b)F\_3(b))\right)b = DF\_1(F\_2(b)F\_3(b))\left(DF\_2(b)F\_3(b) + F\_2(b)DF\_3(b)\right)b$$

$$= F\_1\left(F\_3(b)DF\_2(b)b + F\_2(b)F\_3(b)\right)$$

$$= \frac{1}{2}\nabla \cdot \left(\frac{\partial \mathfrak{a}}{\partial b}\nabla b + \mathfrak{a}\nabla b\right). \tag{10}$$

Finally, with the same considerations used above, we remark that we can write *(∂a/∂b)***∇***b* = **∇ ·** *a*. Therefore:

$$\frac{\partial f}{\partial b}(\overline{b})b = f(b) + \frac{1}{2}\nabla \cdot \nabla \cdot \mathbf{a} - \nabla \cdot \mathbf{v}^\*, \quad \frac{\partial g^m}{\partial b}(\overline{b})b = g^m(b) - \nabla \cdot \boldsymbol{\mathfrak{q}}^m. \qquad (11)$$

As for the Itô covariation bracket, one has:

$$\langle b, b \rangle\_l = \langle \int\_{t0}^{\cdot} \sum\_m g^m(b\_s, s) \mathrm{d}\beta\_s^m, \int\_{t0}^{\cdot} \sum\_k g^k(b\_t, \tau) \mathrm{d}\beta\_\tau^k \rangle\_l = \int\_{t0}^{\cdot} \left( \sum\_m g^m(b\_s, s) \right)^2 \mathrm{d}s$$

We now suppose to be in either one of the following cases:


Two-Step Numerical Scheme for SQG-LU 61

It can be noticed that the first case corresponds to a noise defined from external high-resolution data (and thus that does not depend on the solution) while the second case boils down to impose an incompressibility condition constraint on the large scale component, **∇ ·** *u* = 0, that is indeed often considered in practice with particular scaling of the noise [1, 2]. With these assumptions, we have then:

$$\frac{\partial f}{\partial b} = \frac{\partial^2 f}{\partial b^2} = f, \quad \frac{\partial g}{\partial b}^m = \frac{\partial^2 g^m}{\partial b^2} = g^m. \tag{12}$$

We can now replace all these expressions into (4) and (5), and then (4) and (5) into (2). Keeping only the terms of order one or lower, we obtain:

$$b\_{l} = b\_{l0} + f(b\_{l0})\Delta t + \sum\_{m} \mathbf{g}^{m}(b\_{l0})\Delta \boldsymbol{\beta}^{m} + \int\_{l0}^{t} \int\_{t\_{0}}^{s} \sum\_{m,k} \mathbf{g}^{m}(\mathbf{g}^{k}(b\_{t})) \mathbf{d} \boldsymbol{\beta}\_{\tau}^{k} \mathbf{d} \boldsymbol{\beta}\_{s}^{m}, \qquad (13)$$

where *Δt* <sup>=</sup> *<sup>t</sup>* <sup>−</sup> *<sup>t</sup>*<sup>0</sup> and *Δβ<sup>m</sup>* <sup>=</sup> *<sup>β</sup><sup>m</sup> <sup>t</sup>* <sup>−</sup> *<sup>β</sup><sup>m</sup> <sup>t</sup>*<sup>0</sup> . We define the following quantities:

$$G^{m,k} := \mathbf{g}^m(\mathbf{g}^k(b\_{l\_0})), \quad I^{m,k} := \int\_{t\_0}^t \int\_{t\_0}^s \mathbf{d}\boldsymbol{\beta}\_t^k \mathbf{d}\boldsymbol{\beta}\_s^m \, ds$$

then the double iterated Itô integral in (13) can be approximated as follows:

$$\sum\_{m,k} G^{m,k} I^{m,k} = \sum\_{m,k} G^{m,k} \frac{I^{m,k} + I^{k,m}}{2} + G^{m,k} \frac{I^{m,k} - I^{k,m}}{2}.$$

The first symmetric term can be computed analytically from Itô integration by part formulae, *<sup>I</sup> m,k* <sup>+</sup> *<sup>I</sup> k,m* <sup>=</sup> *ΔβmΔβk* <sup>−</sup> *δm,kΔt*, however the second antisymmetric term *(I m,k* <sup>−</sup> *<sup>I</sup> k,m)/*<sup>2</sup> =: *Am,k <sup>t</sup>*0*,t* cannot and it is known as the Lévy area.

#### **2.1.1 Lévy Area Simulation**

In this subsection, we briefly introduce the methods we used to simulate the Lévy area. More details can be found in [6, 8], where these methods were proposed. The first method to simulate the Lévy area will be referred to as the weak approximation in the rest of this work: in this method, we simulate a random variable that has the same moments as the Lévy area. The second method, which will be referred to as the conditional method, is a recursive method: the time interval *(t*0*,t)* is recursively split into two subintervals of the same length, and the two following relations are used:

$$A\_{t\_0,l}^{m,k} = A\_{t\_0,u}^{m,k} + A\_{u,l}^{m,k} + \frac{1}{2} \left( (\beta\_u^m - \beta\_{t\_0}^m)(\beta\_l^k - \beta\_u^k) - (\beta\_u^k - \beta\_{t\_0}^k)(\beta\_l^m - \beta\_u^m) \right) \tag{14}$$

$$\mathbb{E}[A\_{l\_0,l}|\mathbf{B}\_l - \mathbf{B}\_{l\_0}] = 0.$$

For more details on these two methods, see [7]. Finally, we consider a third approach, where we neglect the Lévy area. We remark that this approach is exact if *Gm,k* <sup>=</sup> *Gk,m*, which is not the case here.

#### *2.2 Multi-Step Schemes*

We next propose a two-step scheme in which the Milstein method is used as the prediction step and the Euler method is adopted as the correction step, it reads:

$$\begin{cases} b\_{l}^{\*} = b\_{l0} + f(b\_{l0}, \mathfrak{u}\_{l0})\Delta t + \sum\_{m} g^{m}(b\_{l0})\Delta \beta^{m} + \sum\_{m,k} G^{m,k} \left( S\_{l\_{0},l}^{m,k} + \tilde{A}\_{l\_{0},l}^{m,k} \right) \\ \mathfrak{u}\_{l}^{\*} = \mathcal{H}(b\_{l}^{\*}) \\ b\_{l} = \frac{1}{2}b\_{l0} + \frac{1}{2} \Big( b\_{l}^{\*} + f(b\_{l}^{\*}, \mathfrak{u}\_{l}^{\*})\Delta t + \sum\_{m} g^{m}(b\_{l}^{\*})\Delta \beta^{m} \Big) \\ \ldots \end{cases} \tag{15}$$

where *Sm,k <sup>t</sup>*0*,t* := *(ΔβmΔβk* <sup>−</sup> *δm,kΔt)/*<sup>2</sup> and *<sup>A</sup>*˜*m,k <sup>t</sup>*0*,t* is one of the approximations of the Lévy area described in the previous subsection. This scheme will be referred to as SRK2-EM (EM stands for Euler-Milstein not for Euler-Maruyama) in the rest of the paper.

In the next section, we first analyse the results of the Milstein schemes with the different Lévy area approximations in order to select the best one. Then, we compare our multi-step scheme to two other multi-step schemes developed in [5] and [4]. We briefly recall them here. The first one, based on a third order Runge-Kutta scheme, (SSPRK3) [5], is:

$$\begin{cases} b^{(1)} = b\_{l\_0} + f\_s(b\_{l\_0}, \mathfrak{u}\_{l\_0})\Delta t + \sum\_m \mathbf{g}^m(b\_{l\_0})\Delta \theta^m \\ \mathbf{u}^{(1)} = \mathcal{H}(b^{(1)}) \\ b^{(2)} = \frac{3}{4}b\_{l\_0} + \frac{1}{4}\left(b^{(1)} + f\_s(b^{(1)}, \mathfrak{u}^{(1)})\Delta t + \sum\_m \mathbf{g}^m(b^{(1)})\Delta \theta^m\right) \\ \mathbf{u}^{(2)} = \mathcal{H}(b^{(2)}) \\ b\_l = \frac{1}{3}b\_{l\_0} + \frac{2}{3}\left(b^{(2)} + f\_s(b^{(2)}, \mathfrak{u}^{(2)})\Delta t + \sum\_m \mathbf{g}^m(b^{(2)})\Delta \theta^m\right) \end{cases} \tag{16}$$

where *fs* = *f* −**∇ ·***(a***∇***b)/*2 denotes the modified drift under Stratonovich integral. The second one, relies on Euler-Heun method [4] equally for Stratonovich integral, reads:

$$\begin{cases} b^{(1)} = b\_{l\_0} + f\_s(b\_{l\_0}, \mathfrak{u}\_{l\_0})\Delta t + \sum\_m \mathbf{g}^m(b\_{l\_0})\Delta \theta^m \\ \mathbf{u}^{(1)} = \mathcal{H}(b^{(1)}) \\ b\_l = \frac{1}{2}b\_{l\_0} + \frac{1}{2}\left(b^{(1)} + f\_s(b^{(1)}, \mathfrak{u}^{(1)})\Delta t + \sum\_m \mathbf{g}^m(b^{(1)})\Delta \theta^m\right) \end{cases} \tag{17}$$

#### **3 Numerical Results**

In this section we show some numerical results. First, the effect of the different approximations of the Lévy area is studied on the Milstein scheme. Then, the multistep scheme is assessed and compared to the ones already proposed in the literature. We focus on two variations of one specific test case plotted in Fig. 1: the initial condition (left) consists of two warm elliptical anticyclones on the bottom of the domain and two cold elliptical cyclones on the top. After one day under moderate noise (centre), the four structures have rotated of approximately 45*o*. After one day under strong noise (right) the nonlinearity of the dynamic is more noticeable. One can find all the configuration details used for these simulations in Chapter 6 of [10] for the moderate noise configuration. For the strong noise, all the basis functions *ϕ<sup>m</sup>* are multiplied by a factor 10.

We will use the following abbreviations for the different numerical schemes


**Fig. 1** Euler-Maruyama simulation of system (1) on a 128 × 128 spatial grid

**Fig. 2** RMSE (normalised by the amplitude of buoyancy *<sup>B</sup>*<sup>0</sup> <sup>=</sup> <sup>10</sup>−<sup>3</sup> m/s2) of different schemes during 30 days of simulation under moderate noise

**Fig. 3** Convergence of different schemes under weak and strong noise. Order 1 in dotted black, order 0*.*5 in dashed black

In Figs. 2 and 3 one can see the difference among the Euler-Maruyama scheme and all the Milstein schemes proposed. In Fig. 2 we plot for each scheme for a period of 30 day the root mean squared error (RMSE), defined as:

$$\text{RMSE} = \frac{1}{|\mathcal{Q}|} \mathbb{E} \Big[ \left\| b\_h - b \right\|\_{L^2(\mathcal{Q})}^2 \Big]^{1/2},\tag{18}$$

where *Ω* denotes the spatial domain, *bh* is the numerical solution of stochastic system (1), and *b* stands for the reference solution downsampled from a highresolution deterministic simulation (recall that the aim of the stochastic setting is to reproduce on coarse grid high-resolution deterministic simulations). The downsampling procedure consists of a first low-pass filtering performed in the Fourier domain and a subsequent subsampling operation. The expectations are estimated from 30 of realization. These results are obtained with a *Δt* twice as small for the Euler scheme with respect to the other schemes. One can observe that Milstein-0 performs slightly better than the other Milstein schemes.

In Fig. 3, we show the rate of strong convergence *γ* of all the schemes discussed, under weak and strong noise. Since the exact solution is unknown, we use the following method [15] to estimate *γ* , for a sufficiently small *Δt*:

$$\gamma \simeq \log\_2 \left( \frac{e\_1}{e\_2} \right), \text{ with } e\_l := \mathbb{E} \left[ \left\| b\_h \left( T, \frac{\Delta t}{2^{l-1}} \right) - b\_h \left( T, \frac{\Delta t}{2^l} \right) \right\|\_{L^2(\varOmega)}^2 \right]^{1/2}.$$

where *bh(T , Δt)* is the numerical solution at the final time *T* obtained with a time step *Δt*. It is important to underline that in order for this method to work, the Brownian trajectories must be fixed. We applied this method for time steps 30*,* 60*,* 120*,* 240, hence obtaining two estimates for *γ* . Is is important to remark that the value of the time steps is given in seconds and the time-scale of the studied phenomenon is of the order of one day. For reference, the CFL condition for this problem at the initial time would give a time step around 300 s. The smallest time step we considered to obtain this estimate is ten times smaller than this. As one can see from Fig. 3, under weak noise all the one-step schemes provide almost identical results and all the multi-step schemes are very similar. It is hard to distinguish among the different numerical schemes proposed. In particular, for the considered span of time steps, the error of the Euler scheme under moderate noise displays a linear trend and the prevailing convergence order in this case is one. The reason of that is explained in Appendix.

Under strong noise, it is easier to see the differences among the schemes. Milstein-weak is a slight improvement on the Euler-Maruyama, but its rate of convergence is far from 1. Milstein-0 has the highest rate of convergence among all the schemes.

In conclusion, Milstein-0 seem to perform better than the other Milstein schemes. Furthermore, it is less computationally demanding. For these reasons, we built our two-step scheme based on Milstein-0.

In Fig. 3 we also compare the multi-step schemes mentioned above: they all have a similar behaviour, with a rate of convergence 0*.*5 ≤ *γ* ≤ 1, but a much smaller error when compared to the one-step schemes. In particular, the two-step scheme proposed in this work (SRK2-EM in the figures) yields the smallest error of all for this test case. The SRK2-EM schemes also yields the smallest RMSE (cf. Fig. 2).

#### **4 Conclusion and Perspectives**

The Milstein schemes analysed in this work improve the numerical results, in particular when used in a multi-step framework. The Lévy area does not seem to play a key role in these test cases, which allows us to drastically reduce the computational costs. It must be pointed out that under weak noise, all the schemes tested provide very similar results. Some ongoing and future work include the understanding of the (non) importance of the Lévy area and whether this is related to the test case, the equations, or other factors.

#### **Appendix: Convergence of Euler-Maruyama Scheme Under Moderate Noise**

To study the behaviour of our system under moderate noise, we use the formalism of [12]; in particular, we write our system in the following generic form:

$$dX\_t = a(\mathbf{x}, t)dt + \epsilon b(\mathbf{x}, t)dW\_t + \epsilon^2 c(\mathbf{x}, t)dt, \quad t \in [0, T] \tag{19}$$

with *a*, *b*, *c*, being jointly *L*2-measurable in *(x, t)*, Lipschitz, bounded linear-growth functions in *x*.

Let *Y <sup>δ</sup>* · be an Euler-Maruyama integration scheme for *X*· with integration step *δ*. Then we may prove in a similar fashion to theorem 4.5.4 in [9] that:

$$\begin{array}{ll} \text{1. } \mathbb{E}[X\_{\ell}]^2 \le \mathcal{C}, \quad \forall t \in [0, T] \\\text{2. } \mathbb{E}\left[|X\_{\ell+\delta} - Y\_{\ell+\delta}^{\delta}| \, | X\_{\ell+\delta} = x \right] \le K(\mathbf{x})(\delta + \sqrt{\epsilon}\sqrt{\delta} + \circ(\delta)). \end{array}$$

Using this and the Lipschitziannity of the coefficients in (19), we may prove a result, to some extent similar to theorem 2.1 in [12], namely that

$$\mathbb{E}\left[\sup\_{t\le t\le T}|X\_I - Y\_I^\delta|\,\bigg|X\_{l0} = x\right] \le K'(\mathbf{x})(\delta + \sqrt{\epsilon}\sqrt{\delta} + \circ(\delta)).\tag{20}$$

In light of this estimate, we may interpret the convergence rate displayed in Fig. 3 as a case where *<sup>δ</sup>* is not small enough when compared to so that <sup>√</sup> <sup>√</sup>*<sup>δ</sup>* does not necessarily prevail over *δ* which is evidenced by the linear rate of convergence.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **The Dissipation Properties of Transport Noise**

**Franco Flandoli and Eliseo Luongo**

**Abstract** The aim of this work is to present, in a compact way, the latest results about the dissipation properties of transport noise in fluid mechanics. Starting from the reasons why transport noise is natural in a passive scalar equation for the heat diffusion and transport, several results about enhanced dissipation due to the noise are presented. Rigorous statements are matched with numerical experiments in order to understand that the sufficient conditions stated are not yet optimal but give a first useful indication.

**Keywords** Dissipation by noise · Turbulence · Eddy diffusion · Vortex patch · Transport noise · Dirichlet boundary condition

#### **1 Introduction**

In the last four years, a new understanding of heat diffusion in a turbulent fluid modeled by white noise has been developed. This model has the interesting feature of describing properly the dissipation properties of a turbulent fluid. The equation for the heat diffusion and transport, with a heat source *q*, is

$$
\partial\_t \theta + \mu \cdot \nabla \theta = \kappa \,\Delta \theta + q \,\tag{1}
$$

where *θ* = *θ (t,x)* is the temperature, *κ* is the diffusion constant and *u* = *u (t,x)* is the velocity field of the fluid. The turbulent fluid is a priori described by a random field, Gaussian and white in time, with covariance structure given a priori (hence the temperature is a passive scalar). In this review we consider the following description for *u*:

Scuola Normale Superiore, Pisa, Italy

e-mail: franco.flandoli@sns.it

F. Flandoli (-) · E. Luongo

$$\mu\left(t,\mathbf{x}\right) = \sum\_{k \in K} \sigma\_k\left(\mathbf{x}\right) \frac{dW\_t^k}{dt} \tag{2}$$

where *σk* are divergence free vector fields satisfying no slip-boundary conditions and *W<sup>k</sup> <sup>t</sup>* are independent Brownian motions on a filtered probability space *Ω,* <sup>F</sup>*,(*F*t)t*≥<sup>0</sup> *,* <sup>P</sup> ; for simplicity, assume *K* is a finite set, but the case of a countable set can be studied without troubles at the price of additional summability assumptions. Some rigorous justification for describing the velocity of a turbulent fluid by Eq. (2) are available in Sect. 2.2. Here we want just give some ideas. Let us denote by *uν,ε* the solution in a domain with boundary *D* of the SPDE

$$\begin{cases} \partial\_t \boldsymbol{u}^{\boldsymbol{\upsilon}, \boldsymbol{\varepsilon}} + \nabla \boldsymbol{p}^{\boldsymbol{\upsilon}, \boldsymbol{\varepsilon}} &= \boldsymbol{\nu} \Delta \boldsymbol{u}^{\boldsymbol{\upsilon}, \boldsymbol{\varepsilon}} - \frac{1}{\varepsilon} \boldsymbol{u}^{\boldsymbol{\upsilon}, \boldsymbol{\varepsilon}} + \frac{1}{\varepsilon} \sum\_{k \in K} \sigma\_k \partial\_t \boldsymbol{d} W\_l^k \\ \text{div} \left( \boldsymbol{u}^{\boldsymbol{\upsilon}, \boldsymbol{\varepsilon}} \right) &= 0 \\ \boldsymbol{u} \vert\_{\partial D} &= 0, \end{cases} \tag{3}$$

where the terms <sup>−</sup><sup>1</sup> *<sup>ε</sup> uν,ε*+<sup>1</sup> *ε <sup>k</sup>*∈*<sup>K</sup> σk∂tdW<sup>k</sup> <sup>t</sup>* describe the roughness of the boundary as stated in Sect. 2.2. Let, moreover, *Wν,ε <sup>t</sup>* <sup>=</sup> ! *<sup>t</sup>* <sup>0</sup> *uν,ε(s) ds*, then it can be proven than

$$\lim\_{\varepsilon \to 0} \mathbb{E}\left[ \sup\_{I \in [0,T]} \|W\_I^{\boldsymbol{\nu},\varepsilon} - \sum\_{k \in K} \sigma\_k W\_I^k \|\_{L^2(D)}^2 \right] = 0,$$

see for example [6].

The correct interpretation of Eq. (1) when *u* has the form (2) is the Stratonovich equation

$$d\theta + \sum\_{k \in K} \sigma\_k \cdot \nabla \theta \diamond dW\_t^k = (\kappa \, \Delta \theta + q) \, dt \tag{4}$$

or equivalently the Itô equation with corrector L*θ* given by the second order differential operator (7) below:

$$d\theta + \sum\_{k \in K} \sigma\_k \cdot \nabla \theta \, dW\_t^k = (\kappa \, \Delta \theta + \mathcal{L}\theta + q) \, dt. \tag{5}$$

There are some motivations for the analysis of Eq. (4) based on the idea to extend to SPDE the remarkable principle of Wong-Zakai [20], see for example [2, 3, 14, 15, 18, 19, 16].

Assuming that the external source *q* and the initial temperature *θ*<sup>0</sup> are deterministic, under suitable mild assumptions the deterministic function

$$\Theta\left(t,x\right) = \mathbb{E}\left[\theta\left(t,x\right)\right],$$

is the solution of the deterministic parabolic equation

$$
\partial\_l \Theta = (\kappa \Delta + \mathcal{L}) \,\Theta + q \,\tag{6}
$$

where <sup>E</sup> denotes the mathematical expectation on *(Ω,* <sup>F</sup>*,* <sup>P</sup>*)*. The main results in the last years are quantitative estimates on the difference *θ* − *Θ*, some convergence properties of the solution of Eq. (5) to the stationary solution of Eq. (6) and the enhanced dissipative properties of the second order differential operator *κΔ* + L, see [7, 9, 10]. These kinds of results explained properly the dissipation properties of transport noise and are the core of this review article.

In Sect. 2 we will present some motivations for the analysis of Eq. (4) as a good model for the heat diffusion in a turbulent fluid and we will introduce the main notations. In Sect. 3 we will present the main results, referring to [7, 9, 10] for some rigorous proofs. Lastly, in Sect. 4 we will present some cases where the coefficients *σk* introduce more dissipation in the model with respect to the theoretical predictions made by the rigorous sufficient conditions, exploiting real computations or numerical simulations following the ideas of [7, 10].

*Remark 1* In this review we only considered the effects of the transport noise on passive scalars. Actually, some results can be stated also for the scalar vorticity of the fluid itself, in two space dimensions. We refer to [8, 11, 12] for further readings. The case of the influence on vector fields is much more difficult and still to be understood.

#### **2 Well-Posedness and Motivations**

#### *2.1 Notations and Definitions*

In this review we will denote by *D* a 2D domain with boundary, either a smooth bounded open set or an infinite 2D channel, namely <sup>R</sup> <sup>×</sup> *(*−1*,* <sup>1</sup>*)*. We write the coordinates using the notation

$$\boldsymbol{\alpha} = (\boldsymbol{\alpha}\_{\mathsf{I}}, \boldsymbol{z}) \in D.$$

Let *<sup>Z</sup>* be a separable Hilbert space, denote by *<sup>L</sup>*2*(*F*t*<sup>0</sup> *,Z)* the space of square integrable random variables with values in *Z*, measurable with respect to F*t*<sup>0</sup> . Moreover, denote by *C*<sup>F</sup> *(*[0*, T* ];*Z)* the space of continuous adapted processes *(Xt)t*∈[0*,T* ] with values in *Z* such that

$$\mathbb{E}\left[\sup\_{t\in[0,T]} \|X\_I\|\_Z^2\right] < \infty$$

and by *L*<sup>2</sup> <sup>F</sup> *(*0*, T* ;*Z)* the space of progressively measurable processes *(Xt)t*∈[0*,T* ] with values in *Z* such that

$$\mathbb{E}\left[\int\_0^T \|X\_I\|\_Z^2 \, dt\right] < \infty.$$

Denote by *L*<sup>2</sup> *(D)* and *Wk,*<sup>2</sup> *(D)* the usual Lebesgue and Sobolev spaces and by *Wk,*<sup>2</sup> <sup>0</sup> *(D)* the closure in *<sup>W</sup>k,*<sup>2</sup> *(D)* of smooth compact support functions. Set *<sup>H</sup>* <sup>=</sup> *<sup>L</sup>*<sup>2</sup> *(D)*, *<sup>V</sup>* <sup>=</sup> *<sup>W</sup>*1*,*<sup>2</sup> <sup>0</sup> *(D)*, *<sup>D</sup> (A)* <sup>=</sup> *<sup>W</sup>*2*,*<sup>2</sup> *(D)* <sup>∩</sup> *<sup>V</sup>* . We denote by ·*,* · and · the inner product and the norm in *H* respectively.

Assume that *<sup>K</sup>* is a finite set and *σk* <sup>∈</sup> *D (A)* ∩ *C*<sup>∞</sup> *<sup>b</sup> (D)* <sup>2</sup> *,* ∇ · *σk* <sup>=</sup> 0, *<sup>k</sup>* <sup>∈</sup> *<sup>K</sup>* (less is sufficient but we do not stress this level of generality). Define the matrixvalued function

$$\mathcal{Q}\left(\mathbf{x},\mathbf{y}\right) = \sum\_{k \in K} \sigma\_k\left(\mathbf{x}\right) \otimes \sigma\_k\left(\mathbf{y}\right).$$

If we denote by *W (t,x)* the vector valued random field

$$W(t, \mathbf{x}) = \sum\_{k \in K} \sigma\_k(\mathbf{x}) \, W\_t^k$$

(the velocity field *u* given by (2) is the distributional time derivative of *W*) then we see that *Q (x, y)* is the space-covariance of *W (*1*, x)*:

$$\mathcal{Q}\left(\mathbf{x},\mathbf{y}\right) = \mathbb{E}\left[W\left(1,\mathbf{x}\right) \otimes W\left(1,\mathbf{y}\right)\right].$$

The matrix-function *Q (x, x)* is elliptic:

$$\sum\_{i,j=1}^d \mathcal{Q}\_{ij}\left(\mathbf{x},\mathbf{x}\right)\xi\_i\xi\_j = \mathbb{E}\left[\left|W\left(t,\mathbf{x}\right)\cdot\xi\right|^2\right] \ge 0$$

for all *<sup>ξ</sup>* <sup>=</sup> *(ξ*1*,...,ξd )* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* . Associated to it define the bounded linear operator

$$\mathbb{Q}: L^2(D; \mathbb{R}^2) \to L^2(D; \mathbb{R}^2), \qquad (\mathbb{Q}v)(\mathbf{x}) = \int\_D \mathcal{Q}(\mathbf{x}, \mathbf{y}) v(\mathbf{y}) \, d\mathbf{y}$$

and the quantities:

$$\tilde{q}(\mathbf{x}) := \min\_{\xi \neq 0} \frac{\xi^T \mathcal{Q}(\mathbf{x}, \mathbf{x}) \xi}{\|\xi\|^2},$$

$$\varepsilon\_{\mathcal{Q}} := \|\mathcal{Q}^{1/2}\|\_{L^2(D; \mathbb{R}^2) \to L^2(D; \mathbb{R}^2)}^2.$$

Consider the divergence form elliptic operator L defined as

$$\left(\mathcal{L}\theta\right)\left(\mathbf{x}\right) = \frac{1}{2} \sum\_{l,j=1}^{d} \partial\_l \left(Q\_{lj}\left(\mathbf{x}, \mathbf{x}\right) \partial\_j \theta\left(\mathbf{x}\right)\right) \tag{7}$$

for *<sup>θ</sup>* <sup>∈</sup> *<sup>W</sup>*2*,*<sup>2</sup> *(D)*. Define the linear operator *<sup>A</sup>* : *<sup>D</sup> (A)* <sup>⊂</sup> *<sup>H</sup>* <sup>→</sup> *<sup>H</sup>* as

$$A\theta = \left(\kappa \Delta + \mathcal{L}\right)\theta \dots$$

It is the infinitesimal generator of an analytic semigroup of negative type, see [1, 4, 13, 17], that we denote by *<sup>e</sup>tA*, *<sup>t</sup>* <sup>≥</sup> 0. Moreover, if *<sup>D</sup>* is bounded, we denote by *κλ* the first eigenvalue of −*κΔ* and by *κλ*<sup>L</sup> the first eigenvalue of −*(κΔ* + L*)*.

**Definition 1** Given *<sup>θ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*F0*,H)* and *<sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*0*, T* ; *H )*, a stochastic process

$$\theta \in C\mathcal{F} \left( [0, T]; H \right) \cap L^2\_{\mathcal{F}} \left( 0, T; V \right)$$

is a mild solution of Eq. (5) if the following identity holds

$$\theta\left(t\right) = e^{tA}\theta\_0 + \int\_0^t e^{(t-s)A} q\left(s\right)ds - \sum\_{k \in K} \int\_0^t e^{(t-s)A} \sigma\_k \cdot \nabla \theta\left(s\right)dW\_s^k$$

for every *<sup>t</sup>* <sup>∈</sup> [0*, T* ], <sup>P</sup>-a.s.

**Theorem 1** *For every <sup>θ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*F0*,H) and <sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*0*, T* ; *H ) there exists a unique θ mild solution of Eq. (5). Moreover θ depends continuously on θ*<sup>0</sup> *and q.*

**Definition 2** Given *<sup>θ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*F0*,H)* and *<sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*0*, T* ; *H )*, we say that a stochastic process *θ* is a weak solution of Eq. (5) if

$$\theta \in C\_{\mathcal{F}}([0, T]; H) \cap L^2\_{\mathcal{F}}(0, T; V)$$

and for every *φ* ∈ *D(A)*, we have

$$
\langle \theta(t), \phi \rangle = \langle \theta\_0, \phi \rangle + \int\_0^t \langle \theta(s), A\phi \rangle \, ds + \int\_0^t \langle q(s), \phi \rangle
$$

$$
+ \sum\_{k \in K} \int\_0^t \langle \theta(s), \sigma\_k \cdot \nabla \phi \rangle \, dW\_s^k
$$

for every *<sup>t</sup>* ∈ [0*, T* ]*,* <sup>P</sup> <sup>−</sup> *a.s.*

**Theorem 2** *θ is a weak solution of problem (5) if and only if is a mild solution of problem (5). Moreover the Itô formula*

$$\begin{aligned} \|\theta(t)\|^2 - \|\theta(0)\|^2 &= 2\int\_0^t \langle \theta(s), q(s) \rangle \, ds + \sum\_{k \in K} \int\_0^t \|\sigma\_k \cdot \nabla \theta(s)\|^2 \, ds \\ &- 2\int\_0^t \langle (-A)^{\frac{1}{2}} \theta(s), (-A)^{\frac{1}{2}} \theta(s) \rangle \, ds \end{aligned}$$

*holds.*

These results are classical and can be found in [5, 10] together with several generalizations.

#### *2.2 Motivations*

In this section we want to give some heuristics to accept Eq. (4) as a correct model for heat diffusion in a turbulent fluid. In the domain *D* we have a fluid with velocity *u* (pressure *p*, constant density = 1) and the heat *θ*. Both *u* and *θ* are equal to zero on *∂D*:

$$\mu|\_{\partial D} = 0$$

$$\theta|\_{\partial D} = 0.$$

The condition *u*|*∂D* = 0 provokes several interesting technical questions. The equations are

$$\begin{aligned} \partial\_t u + u \cdot \nabla u + \nabla p &= f \\ \nabla \cdot u &= 0 \\ \partial\_t \theta + u \cdot \nabla \theta &= \kappa \Delta \theta + q \\ u|\_{t=0} &= u\_0 \\ \theta|\_{t=0} &= \theta\_0. \end{aligned} \tag{8}$$

where *f* and *q* take care of interaction with external sources. In particular, physical boundaries are never completely smooth. Hence, the external source *f* want to model the effects of the roughness of the boundary and its influence to the velocity of the fluid. The instability of the flow at the boundary, originating vortices, is very strong, hence the frequency and intensity of creation of vortices at the boundary strongly suffers from the imprecision of the description of the true boundary. Replacing the true details of the boundary by a random mechanism of vorticity production would increase the realism of the model. Emergence of vortices near obstacles is commonly observed and we content ourselves with an ad hoc inclusion of this fact into the equations. Assume the velocity field at time *t* is *u(t, x)*. Assume that, as a consequence of an instability near the boundary, a modification occurs and in a very short time we have a field *u(t* +*Δt, x)* which is not just equal to the smooth evolution of *u(t, x)*. We may assume that at some time *t* we have a jump:

$$
\mu(t + \Delta t, \mathbf{x}) = \mu(t, \mathbf{x}) + \sigma(\mathbf{x}),
$$

where *σ (x)* is presumably localized in space and corresponds to a vortex structure. After these preliminary comments we can accept to model the roughness via a friction term of intensity <sup>−</sup>*<sup>u</sup> <sup>ε</sup>* and a term of jump described by

The Dissipation Properties of Transport Noise 75

$$W\_N(t, \chi) = \sum\_{k \in K} \frac{\sigma\_k(\chi)}{N} \frac{N^{k, 1}\_{N^2 \mathfrak{t}/\varepsilon^2} - N^{k, 2}\_{N^2 \mathfrak{t}/\varepsilon^2}}{\sqrt{2}},$$

where *N*·*,*· *<sup>t</sup>* are independent Poisson processes. More on this topic can be found in [6]. Applying a Donsker invariance principle to the stochastic process *WN (t, x)*, it converges in law to the gaussian process

$$W\_t(\mathbf{x}) = \frac{1}{\varepsilon} \sum\_{k \in K} \sigma\_k(\mathbf{x}) W\_t^k,$$

where *W<sup>k</sup> <sup>t</sup>* are independent Brownian motions. Parameterizing the solutions of system (8) by *ε* we arrive to the following stochastic coupled system

$$\begin{aligned} \partial\_l \boldsymbol{u}^\varepsilon + \boldsymbol{u}^\varepsilon \cdot \nabla \boldsymbol{u}^\varepsilon + \nabla p^\varepsilon &= -\frac{1}{\varepsilon} \left( \boldsymbol{u}^\varepsilon - \partial\_l \boldsymbol{W} \right) \\ \nabla \cdot \boldsymbol{u}^\varepsilon &= 0 \\ \partial\_l \theta^\varepsilon + \boldsymbol{u}^\varepsilon \cdot \nabla \theta^\varepsilon &= \kappa \Delta \theta^\varepsilon + q \\ \boldsymbol{u}|\_{t=0} &= \boldsymbol{u}\_0 \\ \theta|\_{t=0} &= \theta\_0. \end{aligned} \tag{9}$$

The last step for moving from system (9) to Eq. (4) is trying to understand the behavior of system (9) letting *ε* → 0 and it based on a result proved in [12] in the case of the 2D torus and under analysis in the case of general 2D domains with boundary. Thus just for the last sentence of this subsection we assume the *D* = <sup>T</sup><sup>2</sup> := <sup>R</sup>2*/(*2*π*Z2*).*

**Theorem 3** *Under previous assumptions on q and σ, if moreover:*

*– the coefficients σk are zero-mean and there exists l* ≥ 1 *such that σk* <sup>∈</sup> *<sup>W</sup>l,*<sup>∞</sup> <sup>∀</sup>*<sup>k</sup>* <sup>∈</sup> *<sup>K</sup>*; *–* <sup>∀</sup>*<sup>x</sup>* <sup>∈</sup> <sup>T</sup><sup>2</sup> *it holds <sup>k</sup>*∈*<sup>K</sup> ((K* <sup>∗</sup> *σk)* · ∇*σk)(x)* <sup>=</sup> <sup>0</sup>; *– <sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>1</sup> [0*, T* ];*L*<sup>∞</sup> T2 ; *– <sup>θ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*<sup>∞</sup> T2 *,*

*then for every <sup>f</sup>* <sup>∈</sup> *<sup>L</sup>*1*(*T2*)*

$$\mathbb{E}\left[\left|\int\_{\mathbb{T}^2} (\theta\_l^{\varepsilon} - \theta\_l)(\mathbf{x}) f(\mathbf{x}) \, d\mathbf{x} \right|\right] \to 0 \qquad \text{as } \varepsilon \to 0$$

*for every fixed <sup>t</sup>* ∈ [0*, T* ] *and in <sup>L</sup>p(*[0*, T* ]*) for every finite <sup>p</sup>. Moreover, if <sup>q</sup>* <sup>∈</sup> *<sup>L</sup>*1*(*[0*, T* ];*Lip(*T2*)) then the previous convergence holds uniformly for <sup>t</sup>* ∈ [0*, T* ] *and <sup>f</sup>* <sup>∈</sup> *Lip(*T2*) with* [*<sup>f</sup>* ]*Lip(*T2*)* <sup>≤</sup> <sup>1</sup> *and <sup>f</sup> L*∞*(*T2*)* <sup>≤</sup> <sup>1</sup>*.*

#### **3 Main Results**

The results related to the analysis of these equations can be classified in three categories:


*Remark 2* Even if *Q* is a covariance operator, the third question is far to be trivial. In fact we assumed that *σk*|*∂D* = 0. Thus the operator L degenerates at the boundary.

We will treat all the three problems above, sometimes specializing our general framework.

**Theorem 4** *Assume D is a bounded domain.*

*1. If <sup>θ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*F0*, H ), q* <sup>≡</sup> <sup>0</sup>*. Then,* <sup>∀</sup>*<sup>φ</sup>* <sup>∈</sup> *<sup>L</sup>*∞*(D),*

$$\mathbb{E}\left[\left\langle\phi,\theta(t)-\Theta(t)\right\rangle^2\right] \le \frac{\varepsilon\_{\mathcal{Q}}}{\kappa} \mathbb{E}\left[\left\|\theta\_{0}\right\|^2\right] \left\|\phi\right\|\_{L^{\infty}(D)}.$$

*2. Moreover, if θ*<sup>0</sup> ≥ 0

$$\mathbb{E}\left[\left\|\theta(t)\right\|^2\right] \le \left(\frac{\varepsilon\_{\mathcal{Q}}}{\kappa} + 2|D|e^{-\kappa\lambda\_{\mathcal{L}}t}\right)\mathbb{E}\left[\left\|\theta\_0\right\|^2\right].$$

*Remark 3* A result similar to the first item can be proved also in the case of *D* infinite channel and *q* ≡ 0 adapting the proof of Theorem 7 in [10] to such finite time case.

Thanks to previous theorem is evident that the dissipation properties of the solution of the stochastic Eq. (5) are influenced obviously by the first eigenvalue of the operator <sup>L</sup> but also by the operatorial norm of <sup>Q</sup>1*/*2. Thus, our next step will be state state some sufficient conditions in order to have *εQ* very small and *κλ*<sup>L</sup> ! *κλ*.

For *δ >* 0 fixed, let us define

$$D\_{\delta} := \{ \mathbf{x} \in D \, : \, dist(\mathbf{x}, \partial D) > \delta \}.$$

Then the following theorems hold.

**Theorem 5** *Assume that the family of coefficients (σk (*·*))k*∈*<sup>K</sup> has the following approximate orthogonality property: there exists a finite number <sup>M</sup>* <sup>∈</sup> <sup>N</sup> *and a partition K* = *K*<sup>1</sup> ∪ *...* ∪ *KM such that*

$$
\langle \sigma\_k, \sigma\_{k'} \rangle = 0 \, for \, all \, k, \, k' \in K\_{\bar{k}}
$$

*for all i* = 1*,...,M. Then*

$$\varepsilon\_{\mathcal{Q}} \le M \sup\_{k \in K} \left\lVert \sigma\_k \right\rVert^2.$$

**Theorem 6** *Assuming that q(x)* ˜ <sup>≥</sup> *<sup>σ</sup>*<sup>2</sup> *in Dδ, then for any κ >* <sup>0</sup> *fixed*

$$\lim\_{(\sigma,\delta)\to(+\infty,0)}\kappa\lambda\_{\mathcal{L}}=+\infty.$$

**Theorem 7** *There exists a constant CD >* 0 *such that*

$$
\kappa \lambda\_{\mathcal{L}} \ge C\_D \min \left( \sigma^2, \frac{\kappa}{\delta} \right),
$$

*for every Q such that*

$$
\tilde{q}(\mathbf{x}) \ge \sigma^2 \text{ in } D\_\delta.
$$

*When D is the unit ball, asymptotically as δ* → 0*, one can take CD* = 1 *and*

$$
\kappa \lambda\_{\mathcal{L}} \ge \frac{2\kappa}{\kappa + \delta \sigma^2} \sigma^2.
$$

From the last two theorems we understand that the dissipation properties enhance if *εQ* is very small and *q(x)* ˜ is very large except for a small boundary layer around *∂D*. Obviously *εQ* is related to the operatorial norm of Q1*/*<sup>2</sup> and thus, loosely speaking, is related to the operatorial norm of <sup>Q</sup>. Instead *q(x)* ˜ is related to the trace of <sup>Q</sup>, i.e.

$$\operatorname{Tr}(\mathbb{Q}) = \int\_{D} \operatorname{Tr} \mathcal{Q}(\mathbf{x}, \mathbf{x}) \, d\mathbf{x}.$$

Consequently we want that the operatorial norm of Q is small and the trace of Q is arbitrarily large and, possibly, infinity. Hence the existence of such operators *Q* which increase the dissipativity properties of the equation is not surprising. The last issue related to this topic is the presentation of an operator *Q* which has a fluid dynamics interpretation and satisfies previous property. This definition for general domain *D* is a bit implicit. Thus in the last section we will present some more explicit computations.

Let us fix a parameter *!* such that 0 *< !* ≤ *δ* , consider a smooth probability density function *<sup>Ψ</sup>* : <sup>R</sup><sup>2</sup> <sup>→</sup> <sup>R</sup> with compact support in *B(*0*,* <sup>1</sup>*)* and let us denote by *K(x, y)* the Biot-Savart kernel in *D*. We recall that a point vortex in *x*<sup>0</sup> has vorticity *δx*<sup>0</sup> and smoothing it by *Ψ!(x)* := <sup>1</sup> *!*<sup>2</sup>*<sup>Ψ</sup> <sup>x</sup> !* , then it has vorticity <sup>1</sup> *!*<sup>2</sup>*<sup>Ψ</sup> <sup>x</sup>*−*x*<sup>0</sup> *!* .

Now let us consider a random variable *X*<sup>0</sup> distributed uniformly on *D*2*δ*, a real random variable *Γ*<sup>0</sup> such that

$$\mathbb{E}\left[\varGamma\_0\right] = 0, \qquad \varepsilon\_0^2 := \mathbb{E}\left[\varGamma\_0^2\right] < \infty$$

and set

$$
\mu\_{\ell}(\mathbf{x}) = \Gamma\_0 \int\_D K\left(\mathbf{x}, \mathbf{y}\right) \theta \ell(\mathbf{y} - X\_0) d\mathbf{y}
$$

$$
=: \Gamma\_0 K\_{\ell}(\mathbf{x}, X\_0).
$$

If we consider in Eq. (5) the Brownian motion *W (t,x)*, with covariance operator

$$\mathcal{Q}\_{\ell}(\mathbf{x}, \mathbf{y}) = \varepsilon\_0^2 \mathbb{E} \left[ K\_{\ell}(\mathbf{x}, X\_0) \otimes K\_{\ell}(\mathbf{y}, X\_0) \right]^2$$

one has

$$\boldsymbol{v}^T \boldsymbol{Q}\_\ell(\mathbf{x}, \boldsymbol{x}) \, \boldsymbol{v} = \boldsymbol{\varepsilon}\_0^2 \mathbb{E} \left[ \left| \boldsymbol{K}\_\ell(\mathbf{x}, \boldsymbol{X}\_0) \cdot \boldsymbol{v} \right|^2 \right] \qquad \text{for } \boldsymbol{v} \in \mathbb{R}^2$$

$$\langle \mathbb{Q}\_{\ell}w, w \rangle = \varepsilon\_0^2 \mathbb{E} \left[ \left( \int\_D w \,(\mathbf{x}) \cdot K\_{\ell}(\mathbf{x}, X\_0) d\mathbf{x} \right)^2 \right] \qquad \text{for } w \in L^2(D; \mathbb{R}^2).$$

Inside the previous identities there is the key to have *v<sup>T</sup> Q (x, x) v* large and <sup>Q</sup>*w, w* small. Moreover, the law of *<sup>u</sup>* on the space of divergence free square integrable vector fields with null normal trace, heuristically, is a Poisson Point Process generating smoothed point vortices (and the associated velocity field) in random positions of *D*. Thus this kind of noise is reasonable for model what we expect from the heuristic analysis described in Sect. 2.

#### **Theorem 8**

*– There exists a constant C >* 0 *such that*

$$
\langle \mathbb{Q}\_\ell v, v \rangle \le C \varepsilon\_0^2 \left\| v \right\|\_{H}^2
$$

*for every v* ∈ *H and ! >* 0*.*

*– For every x* ∈ *D, let q! (x)* ≥ 0 *be the largest number such that*

$$v^T \mathcal{Q}\_\ell \left( \mathbf{x}, \mathbf{x} \right) v \ge q\_\ell \left( \mathbf{x} \right) \left| v \right|^2$$

*for all <sup>v</sup>* <sup>∈</sup> <sup>R</sup>2*. Then*

$$\lim\_{\ell \to \infty} \inf\_{\substack{\mathbf{x} \in D\_{2\delta}}} q\_{\ell}(\mathbf{x}) = +\infty.$$

In the last result of this section the presence of the external source *q* is crucial. Moreover we assume that *q* is independent of time and introduce the stationary solution of Eq. (6)

$$
\Theta\_{s1} := -A^{-1}q.
$$

In fact we want to study the convergence of the solution of the stochastic Eq. (5) to *Θst* .

Set

$$C\_{\infty}(\theta\_0, q) := \sup\_{t \ge 0} \mathbb{E}\left[ \|\theta\left(t\right)\|\_{\infty}^2 \right].$$

**Theorem 9** *If <sup>θ</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*2*(*F0*, D(A)) and <sup>q</sup>* <sup>∈</sup> *D(A), then*

*– C*<sup>∞</sup> *(θ*0*, q) <* ∞*.*

*– For every φ* ∈ *H*

$$\limsup\_{t \to \infty} \mathbb{E}\left[ \left| \langle \theta \left( t \right) - \Theta\_{sI}, \phi \rangle \right|^2 \right] \le \frac{\varepsilon\_{\mathcal{Q}}}{\kappa} \left\| \phi \right\|^2 \mathcal{C}\_{\infty}(\theta\_0, q) \left. \right| $$

In order to be of interest for applications, this theorem requires two conditions:


Obviously if *κλL* ! *κλ* then *Θst* is significantly affected by the noise. Thus we reconduct ourselves to the previous framework already treated. In Sect. 4 we will show a concrete example where this phenomenon appears.

#### **4 Explicit Computations**

Theorem 8 is not completely suitable for numerical simulations because the definition of *K(x, y)* is not explicitly available for every domain smooth and bounded. In this section we will present an explicit construction with a fluid dynamics interpretation, again based on vortex structures, which satisfies both *εQ* arbitrarily small and *q*˜ arbitrarily large outside a boundary layer. Moreover we will show numerically that, even relaxing the conditions in this construction, the noise influences the behavior of the stationary solution.

#### *4.1 Explicit Construction*

We will construct a noise of the form *Γ <sup>k</sup>*∈*<sup>K</sup> uk (x) dW<sup>k</sup> <sup>t</sup>* with

$$\mu\_k\left(\mathbf{x}\right) = w\_r\left(\mathbf{x} - \mathbf{x}\_k\right), \qquad w\_r\left(\mathbf{x}\right) = r^{-1}w\left(\frac{\mathbf{x}}{r}\right)$$

for suitable *r* and *w*. Thus, the covariance of this noise is

$$\mathcal{Q}\left(\mathbf{x},\mathbf{y}\right) = \boldsymbol{\Gamma}^2 \sum\_{z \in A\_N} w\_r\left(\mathbf{x} - \mathbf{z}\right) \otimes w\_r\left(\mathbf{y} - \mathbf{z}\right) \dots$$

We need to choose *xk*, called the "centers" of the vortex blobs, and a suitable vector field *w*. The vector field *w* must satisfy several conditions:


The first two properties are useful in order to have that the *uj* 's model the velocity of an incompressible fluid at rest. The third one is close to our idea of vortex structures.

Now we choose the centers. For a fixed *δ >* 0, we choose a positive integer *N* such that <sup>1</sup> *<sup>N</sup>* ≤ *δ*. Then we consider the set *ΛN* of all points of *Dδ* having coordinates of the form *<sup>k</sup> <sup>N</sup> , <sup>h</sup> N* with *k, h* <sup>∈</sup> <sup>Z</sup>. Thanks to this choice we have

$$\min\_{z\_1 \neq z\_2 \in A\_N} |z\_1 - z\_2| = \frac{1}{N}, \qquad \min\_{z \in A\_N} d\left(z, \,\partial D\right) \ge \delta.$$

We choose another positive integer *M* and we decompose the set *ΛN* as the disjoint union of the sets

$$A\_N = \bigcup\_{(k\_0, h\_0) \in \{0, 1, \dots, M - 1\}^2} A\_N^{(M, k\_0, h\_0)}$$

where *<sup>k</sup> <sup>N</sup> , <sup>h</sup> N* <sup>∈</sup> *Λ(M,k*0*,h*0*) <sup>N</sup>* if *<sup>k</sup>* <sup>=</sup> *Mn* <sup>+</sup> *<sup>k</sup>*0, *<sup>h</sup>* <sup>=</sup> *Mm* <sup>+</sup> *<sup>h</sup>*0, with *n, m* <sup>∈</sup> <sup>Z</sup>. In this way, we have

$$\min\_{z\_1 \neq z\_2 \in A\_N^{(M, k\_0, h\_0)}} |z\_1 - z\_2| = \frac{M}{N}$$

for each *(k*0*, h*0*)* ∈ {0*,* 1*,...,M* − 1} 2. We have introduced *M* and the sets *Λ(M,k*0*,h*0*) <sup>N</sup>* in order to have that each couple of *uj* and *uk* in the same class have disjoint supports for *r* small enough and this is sufficient for our estimates, because it implies that the vector fields are "almost" orthogonal in the sense of Theorem 5. In order to have the supports disjoint for elements of *Λ(M,k*0*,h*0*) <sup>N</sup>* and the action of the noise covers the full set *<sup>D</sup>*2*<sup>δ</sup>* we ask *<sup>r</sup>* <sup>≤</sup> *<sup>M</sup>* <sup>2</sup>*<sup>N</sup>* . Now we can focus on the vector field *w*. In order of being divergence free we set *w* = ∇⊥*ψ*. Thus, we look for a smooth function *ψ* on R2, compactly supported in *B (*0*,* 1*)*, close to <sup>1</sup> <sup>2</sup>*<sup>π</sup>* log |*x*| near *x* = 0. A possible construction is the following one:

$$
\psi(\mathbf{x}) = \int\_{\mathbb{R}^2} \psi\_0(\mathbf{x} - \mathbf{y}) f\_\varepsilon(\mathbf{y}) \, d\mathbf{y}
$$

where *fε* is a mollifier with support in *B(*0*, ε)* and *<sup>ψ</sup>*<sup>0</sup> is a *<sup>C</sup>*∞*(*R<sup>2</sup> \ {0}*)* radial function such that

$$
\psi\_0(\mathbf{x}) = \frac{\log|\mathbf{x}|}{2\pi} \quad \text{for} \ |\mathbf{x}| \le \frac{1}{3} \text{ and } \psi\_0(\mathbf{x}) = 0 \quad \text{for} \ |\mathbf{x}| > \frac{2}{3}.
$$

Moreover, it can be proved that *w* defined above satisfies

$$\left\|\boldsymbol{w}\right\|^2 \leq C \log \frac{1}{\varepsilon}, \qquad \left\|\boldsymbol{w}\_r\right\|^2 = \left\|\boldsymbol{w}\right\|^2.$$

Thanks to these relations we can obtain, easily, an estimate of *εQ*

$$\begin{split} &\int\int\limits\_{\boldsymbol{v}} \boldsymbol{v} \left(\mathbf{x}\right)^{T} \boldsymbol{Q} \left(\mathbf{x}, \mathbf{y}\right) \boldsymbol{v} \left(\mathbf{y}\right) d\mathbf{x} d\mathbf{y} = \boldsymbol{I}^{2} \sum\_{\boldsymbol{z} \in \boldsymbol{A}\_{N}} \left( \int\limits\_{\boldsymbol{w}} \boldsymbol{w}\_{r} \left(\mathbf{x} - \boldsymbol{z}\right) \cdot \boldsymbol{v} \left(\mathbf{x}\right) d\mathbf{x} \right)^{2} \\ &= \left\|\boldsymbol{w}\right\| \boldsymbol{I}^{2} \sum\_{\left(\boldsymbol{k}\_{0},\boldsymbol{h}\right) \in \left[0, 1, \ldots, M-1\right]^{2}} \sum\_{\boldsymbol{z} \in \boldsymbol{A}\_{N}^{(M,\boldsymbol{k}\_{0},\boldsymbol{h}\_{0})}} \left( \int\limits\_{\boldsymbol{w}} \frac{\boldsymbol{w}\_{r}\left(\mathbf{x} - \boldsymbol{z}\right)}{\left\|\boldsymbol{w}\right\|\_{L^{2}}} \cdot \boldsymbol{v}\left(\mathbf{x}\right) d\mathbf{x} \right)^{2} \\ &\leq \boldsymbol{M}^{2} \left\|\boldsymbol{w}\right\|^{2} \boldsymbol{I}^{2} \left\|\boldsymbol{v}\right\|^{2} .\end{split}$$

Thus, taking *<sup>ε</sup>* <sup>=</sup> <sup>1</sup> *<sup>N</sup>* we get

$$\varepsilon\_{\mathcal{Q}} \le M^2 \Gamma^2 C \log N$$

which is small if, given *N*, *Γ* is small enough.

For what concern the analysis of a lower bound for *q(x)* ˜ in *D*2*δ*, the computations are a bit more involving and we refer to [7] for a complete discussion which is out of our scope. We just claim that if

$$r \geq \frac{12}{N}, \qquad M > 24, \qquad N \text{ is large enough}$$

then

$$
\tilde{q}(\mathbf{x}) \ge \frac{\Gamma^2 N}{16\pi} \qquad \text{in } D\_{2\delta}.
$$

#### *4.2 Numerical Simulation*

Summing up the results of previous subsection, we have seen that if

$$r \le \frac{M}{2N}, \qquad r \le \delta, \qquad r \ge \frac{12}{N}, \qquad \varepsilon = \frac{1}{N} \le \frac{1}{6}, \qquad N \text{ is large enough}$$

we have

$$
\varepsilon\_{\mathcal{Q}} \le M^2 \Gamma^2 C \log N, \qquad \tilde{q}(\mathbf{x}) \ge \frac{\Gamma^2 N}{16\pi} \qquad \text{in } D\_{2\delta}.
$$

These conditions are strong from the numerical point of view: the cardinality of *K* must be very large and a finite but not small *M* is required. Certain supports have to overlap so that the noise acts everywhere. However in [10] it has been shown, numerically, that these conditions are overabundant and much less is required to see the influence of the noise on the solution, namely that *Θst* differs significantly from the parabolic profile even for relatively modest sets *K* and for *M* = 1. In this subsection we are working in an infinite 2D channel, suspend the requirement that *q,Θ* have to decay at infinity, although not strictly covered by the theory described in Sect. 2.1. We assume that the function *q (x)* is equal to a constant *q*.

For numerical reasons we consider the problem in the bounded domain

$$D = (\tan(-1.54), \tan(1.54)) \times (-0.1, 0.1).$$

In order to have that the *σk*'s model a fluid at rest, we can take

$$r \le \max\_{k \in K} d(\partial \tilde{D}, \mathbf{x}\_k) \text{ and } \text{ } \varepsilon < \frac{1}{6}.$$

These are the real constraints on the parameters of our numerical simulation. The other parameters *Γ, K,* {*xk*}*k*∈*<sup>K</sup>* can be chosen more arbitrarily in order to have satisfactory results.

Differently from [10], here the vortex structures have not been chosen on a grid equally spaced in both directions. In particular the points thicken in the *x*<sup>1</sup> direction. We have chosen 2 points in the *z* direction between −0*.*05 and 0*.*05 and for what concern the *x*<sup>1</sup> direction we have chosen 2 points between 0 and 0*.*2, 4 points between 0*.*2 and 0*.*4 and 8 points between 0*.*4 and 0*.*6. In order to improve the smoothness of the solution, avoiding a shock in the number of vortices, we prefer to consider some few vortices for *x*<sup>1</sup> *>* 0*.*6. They only slightly affect the behavior of our solution in the critical region of interest *x*<sup>1</sup> *<* 0*.*5. Thus we consider 4 points between 0*.*6 and 0*.*8 and 2 points between 0*.*8 and 1. Obviously we avoid repetition of the vortices. In conclusion we have 34 vortices. Moreover, we take *r* = 0*.*05*, ε* = 0*.*1 and *Γ* = 0*.*03. The other parameters of the problem are *κ* = 0*.*05 and *q* ≡ 1. In this way the quantity *M* and *N* are not well defined and the impact

**Fig. 1** Solution in the critical region

of the operator L is related to a small portion of the domain *D*˜ , however we can completely appreciate how it changes the profile of the solution.

Figures 1 and 2 illustrate the modification of the profile, from the standard parabolic one of free diffusion in a steady medium, to the case of turbulent decay. Even if we use just a really reduced number of vortices we can observe a significant decay modification of the profile due to turbulence where vortices thicken.

**Fig. 2** Profiles at different values of *x*<sup>1</sup>

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Existence and Uniqueness of Maximal Solutions to a 3D Navier-Stokes Equation with Stochastic Lie Transport**

#### **Daniel Goodair**

**Abstract** We present here a criterion to conclude that an abstract SPDE possesses a unique maximal strong solution, which we apply to a three dimensional Stochastic Navier-Stokes Equation. Motivated by the work of Kato and Lai we ask that there is a comparable result here in the stochastic case whilst facilitating a variety of noise structures such as additive, multiplicative and transport. In particular our criterion is designed to fit viscous fluid dynamics models with Stochastic Advection by Lie Transport (SALT) as introduced in Holm (Proc R Soc A: Math Phys Eng Sci 471(2176):20140963, 2015). Our application to the Incompressible Navier-Stokes equation matches the existence and uniqueness result of the deterministic theory. This short work summarises the results and announces two papers (Crisan et al., Existence and uniqueness of maximal strong solutions to nonlinear SPDEs with applications to viscous fluid models, in preparation; Crisan and Goodair, Analytical properties of a 3D stochastic Navier-Stokes equation, 2022, in preparation) which give the full details for the abstract well-posedness arguments and application to the Navier-Stokes Equation respectively.

**Keywords** Stochastic transport · SPDE · Navier-Stokes · Well-posedness

#### **1 Introduction**

The theoretical analysis of fluid models perturbed by transport noise has been in significant demand since the release of the seminal works [16] and [17]. In the papers Holm and Mémin establish a new class of stochastic equations driven by transport noise which serve as much improved fluid dynamics models by adding uncertainty in the transport of the fluid parcels to reflect the unresolved scales. Here we consider the SALT [16] Navier-Stokes Equation given by

D. Goodair (-)

Imperial College London, London, England, UK e-mail: daniel.goodair16@imperial.ac.uk https://www.imperial.ac.uk/people/daniel.goodair16

© The Author(s) 2023

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_7 88 D. Goodair

$$u\_t - u\_0 + \int\_0^t \mathcal{L}\_{u\_3} u\_s \, ds - \int\_0^t \Delta u\_s \, ds + \int\_0^t B u\_s \circ d\mathcal{W}\_s + \int\_0^t \nabla \rho\_3 \, ds = 0 \tag{1}$$

and supplemented with the divergence-free (incompressibility) and zero-average conditions on the three dimensional torus T3. The equation is presented here in velocity form where *u* represents the fluid velocity, *ρ* the pressure, L is the mapping corresponding to the nonlinear term, W is a cylindrical Brownian Motion and *B* is the relevant transport operator defined with respect to a collection of functions *(ξi)* which physically represent spatial correlations. The explicit meaning of these conditions and the definitions of the operators involved are given at the beginning of Sect. 2.2. These *(ξi)* can be determined at coarse-grain resolutions from finely resolved numerical simulations, and mathematically are derived as eigenvectors of a velocity-velocity correlation matrix (see [3, 4, 5]). The corresponding stochastic Euler equation was derived in [12] and the viscous term plays no additional role in the stochastic derivation (without loss of generality we set the viscosity coefficient to be 1).

There has been limited progress in proving well-posedness for this class of equations: Crisan, Flandoli and Holm [5] have shown local existence and uniqueness for the 3D Euler Equation on the torus, whilst Crisan and Lang [9, 11, 10] demonstrated the same result for the Euler, Rotating Shallow Water and Great Lake Equations on the torus once more. Whilst this represents a strong start in the theoretical analysis (alongside works for SPDEs with general transport noise e.g. [2, 1]), the modelling literature continues to expand in both the deterministic fluid models (see for example Figure 2 of [8] and the analysis therein) and method of stochastic perturbation (for example we may soon look to introduce nonlinearity and time dependence in the *(ξi)*). The significance of an abstract approach to the well-posedness question is clear, and whilst we discuss here only an application to SALT Navier-Stokes [16, 12] the hope is that other stochastic viscous fluid models can be similarly solved by simply checking the required assumptions. We state our equation in the form

$$
\Psi\_l = \Psi\_0 + \int\_0^l \mathcal{A}(\mathbf{s}, \Psi\_s) ds + \int\_0^l \mathcal{G}(\mathbf{s}, \Psi\_s) d\mathcal{W}\_s \tag{2}
$$

for operators A and G to be elucidated in due course. The most notable contribution to the well-posedness theory for an abstract nonlinear SPDE is from [13]. Here the authors prove the existence of a unique maximal solution to their abstract equation and apply this to the three dimensional primitive equations with a Lipschitz type multiplicative noise. The class of equations which we are concerned with include a differential operator in the noise term, preventing us from applying this framework. Moreover the assumptions on their operator A are quite explicit in terms of the sum of the standard fluid nonlinear term and a linear operator, which we don't restrict ourselves to. Overall our assumptions are much more general and allow for a straightforwards application to a wider class of SPDEs. Another relevant piece here is the work of Glatt-Holtz and Ziane [14] whom show the same existence and uniqueness for the incompressible 3*D* Navier-Stokes with again a Lipschitz noise term. Though we cannot apply this method in the presence of our transport noise we look to adapt this argument to fit not just our Navier-Stokes equation but the wider class of stochastic viscous fluid models and SPDEs beyond. The impact of the boundary is fundamental to the equation and the approach of Glatt-Holtz and Ziane copes with the arising issues by working in the right function spaces; we recognised the importance of this in establishing an abstract framework which we hope to apply to such stochastic transport equations on the bounded domain as well.

This short summary work contains three more sections: in the subsequent one we properly define our Stochastic Navier-Stokes equation through the operators involved, the relevant function spaces, the notions of solution and main results. Following this we concretely define our abstract formulation and notion of solution, giving the assumptions that we require and the main results for the abstract equation. These assumptions are then all that needs to be checked to conclude the relevant existence and uniqueness for the proposed SPDE. In the final section we discuss the key steps behind proving these results; in the spirit of this as a summary work announcing our results we do not give a complete proof, though all such arguments are to be found in [7]. We then address how our Navier-Stokes equation fits the context of the abstract formulation, though once more we do not give a thorough justification that the operators of our equation satisfy the required assumptions, with this precise treatment to come in [6].

#### **2 SALT Navier-Stokes and Results**

As alluded to in this section we formally introduce Eq. (1) and state the main results.

#### *2.1 Preliminaries from Stochastic Analysis*

Throughout the paper we work with a fixed filtered probability space

*(Ω,* <sup>F</sup>*, (*F*t),* <sup>P</sup>*)*satisfying the usual conditions of completeness and right continuity. We take W to be a cylindrical Brownian Motion over some Hilbert Space U with orthonormal basis *(ei)*. The choice of U and the subsequent basis play no role in the analysis. Recall ([15, Subsection 1.4]) that W admits the representation <sup>W</sup>*<sup>t</sup>* <sup>=</sup> <sup>∞</sup> *<sup>i</sup>*=<sup>1</sup> *eiW<sup>i</sup> <sup>t</sup>* as a limit in *<sup>L</sup>*2*(Ω*; <sup>U</sup> *)* whereby the *(W<sup>i</sup> )* are a collection of i.i.d. standard real valued Brownian Motions and U is an enlargement of the Hilbert Space U such that the embedding *J* : U → U is Hilbert-Schmidt and W is a *J J* ∗−cylindrical Brownian Motion over U . Given a process *<sup>F</sup>* : [0*, T* ] × *<sup>Ω</sup>* <sup>→</sup> <sup>L</sup> <sup>2</sup>*(*U; <sup>H</sup> *)* progressively measurable and such that *<sup>F</sup>* <sup>∈</sup> *L*2 *<sup>Ω</sup>* × [0*, T* ]; <sup>L</sup> <sup>2</sup>*(*U; <sup>H</sup> *)* , for any 0 ≤ *t* ≤ *T* we understand the stochastic integral

$$\int\_{0}^{t} F\_{s} d\mathcal{W}\_{s}$$

to be the infinite sum

$$\sum\_{i=1}^{\infty} \int\_0^I F\_s(e\_i)dW\_s^i$$

taken in *<sup>L</sup>*2*(Ω*; <sup>H</sup> *)*. We can extend this notion to processes *<sup>F</sup>* which are such that *F (ω)* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> [0*, T* ]; <sup>L</sup> <sup>2</sup>*(*U; <sup>H</sup> *)* for <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>* via the traditional localisation procedure. In this case the stochastic integral is a local martingale in H . A complete, direct construction of this integral, a treatment of its properties and the fundamentals of stochastic calculus in infinite dimensions can be found in [15, Section 1].

#### *2.2 SALT Navier-Stokes Equation*

We present Eq. (1) on the three dimensional torus T<sup>3</sup> (noting that all results hold on T2), and detail now the operators involved alongside the function spaces which define the equations. The operator L is defined for sufficiently regular functions *φ,ψ* : <sup>T</sup><sup>3</sup> <sup>→</sup> <sup>R</sup><sup>3</sup> by

$$\mathcal{L}\_{\phi}\psi := \sum\_{j=1}^{3} \phi^{j}\,\partial\_{j}\psi$$

where *<sup>φ</sup><sup>j</sup>* : <sup>T</sup><sup>3</sup> <sup>→</sup> <sup>R</sup> is the *<sup>j</sup>* th coordinate mapping of *<sup>φ</sup>* and *∂jψ* is defined by its *<sup>k</sup>*th coordinate mapping *(∂jψ)<sup>k</sup>* <sup>=</sup> *∂jψ<sup>k</sup>*. The operator *<sup>B</sup>* is defined as a linear operator on U (introduced in Sect. 2.1) by its action on the basis vectors *B(ei,* ·*)* := *Bi(*·*)* by

$$B\_l = \mathcal{L}\_{\xi\_l} + \mathcal{T}\_{\xi\_l}$$

for L as above and

$$\mathcal{T}\_{\phi}\psi := \sum\_{j=1}^{3} \psi^{j}\nabla\phi^{j}.$$

A complete discussion of how *B* is then defined on U is given in [15, Subsection 2.2]. We embed the divergence-free and zero-average conditions into the relevant function spaces and simply define our solutions as belonging to these spaces. To be explicit, by a divergence-free function we mean a *<sup>φ</sup>* <sup>∈</sup> *<sup>W</sup>*1*,*2*(*T3; <sup>R</sup>3*)* such that

$$\sum\_{j=1}^{3} \partial\_j \phi^j = 0$$

and by zero-average we ask for a *<sup>ψ</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*T3; <sup>R</sup>3*)* with the property

$$\int\_{\mathbb{T}^3} \psi \,d\lambda = 0$$

for *λ* the Lebesgue measure on T3. We introduce the space *L*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* as the subspace of *<sup>L</sup>*2*(*T3; <sup>R</sup>3*)* consisting of zero-average functions which are 'weakly divergence-free'; see [18] Definition 2.1 for the precise construction. *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* is then defined as the subspace of *<sup>W</sup>*1*,*2*(*T3; <sup>R</sup>3*)* consisting of zero-average divergence-free functions, and *<sup>W</sup>*2*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* := *<sup>W</sup>*2*,*2*(*T3; <sup>R</sup>3*)* <sup>∩</sup> *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*.

As is standard in the treatment of the incompressible Navier-Stokes Equation we consider a projected version to eliminate the pressure term and facilitate us working in the above spaces. Note that *ρ* does not come with an evolution equation and is simply chosen to ensure the incompressibility condition. The idea is to solve the projected equation and then append a pressure to it, see [18]. To this end we introduce the standard Leray Projector P defined as the orthogonal projection in *<sup>L</sup>*2*(*T3; <sup>R</sup>3*)* onto *<sup>L</sup>*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*. As we look to project equation (1) as discussed, we ought to address the Stratonovich integral. We look to convert this term into an Itô integral to enable our analysis, but the resulting converted and projected equation should not depend on the order in which the projection and conversion occur. To this end we assume that the *(ξi)* are such that *ξi* <sup>∈</sup> *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* <sup>∩</sup> *<sup>W</sup>*3*,*∞*(*T3; <sup>R</sup>3*)* and satisfy the bound

$$\sum\_{l=1}^{\infty} \left\| \xi\_l \right\|\_{W^{3,\infty}}^2 < \infty. \tag{3}$$

The significance of the bound (3) will be revisited, but for now we note that as each *ξi* is divergence-free then each *Bi* satisfies the property that P*Bi* is equal to P*Bi*P on *<sup>W</sup>*1*,*2*(*T3; <sup>R</sup>3*)* which ensures that the projection and conversion commute. Our new equation is then

$$\begin{aligned} \|u\_t - u\_0 + \int\_0^t \mathcal{P} \mathcal{L}\_{u\_s} u\_s \, ds + \int\_0^t A u\_s ds \\ - \frac{1}{2} \sum\_{l=1}^\infty \int\_0^l \mathcal{P} \mathcal{B}\_l^2 u\_s ds + \sum\_{l=1}^\infty \int\_0^l \mathcal{P} \mathcal{B}\_l u\_s dW\_s^l = 0 \end{aligned} \tag{4}$$

where *A* := −P*Δ* is known as the Stokes Operator. Details of the Itô-Stratonovich conversion can be found in [15, Subsection 2.3]. We shall use the Stokes operator to define inner products with which we equip our function spaces. Recall from [18] Theorem 2.24 for example that there exists a collection of functions *(ak)*, *ak* <sup>∈</sup> *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* <sup>∩</sup> *<sup>C</sup>*∞*(*T3; <sup>R</sup>3*)* such that the *(ak)* are eigenfunctions of *<sup>A</sup>*, are an orthonormal basis in *L*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* and an orthogonal basis in *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* considered as Hilbert Spaces with standard *<sup>L</sup>*2*(*T3; <sup>R</sup>3*)*, *<sup>W</sup>*1*,*2*(*T3; <sup>R</sup>3*)* inner products. The corresponding eigenvalues *(λk)* are strictly positive and approach infinity as *<sup>k</sup>* → ∞. Thus any *<sup>φ</sup>* <sup>∈</sup> *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* admits the representation

$$\phi = \sum\_{k=1}^{\infty} \phi\_k a\_k$$

so for *<sup>m</sup>* <sup>∈</sup> <sup>N</sup> we can define *Am/*<sup>2</sup> by

$$A^{m/2} : \phi \mapsto \sum\_{k=1}^{\infty} \lambda\_k^{m/2} \phi\_k a\_k$$

which is a well defined element of *L*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* on any *<sup>φ</sup>* such that

$$\sum\_{k=1}^{\infty} \lambda\_k^m \phi\_k^2 < \infty. \tag{5}$$

For *φ,ψ* with the property (5) then the bilinear form

$$\langle \phi, \psi \rangle\_m := \langle A^{m/2} \phi, A^{m/2} \psi \rangle$$

is well defined. For *m* = 1*,* 2 this is an inner product on the spaces *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*, *<sup>W</sup>*2*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* respectively which is equivalent to the standard *<sup>W</sup>*1*,*2*(*T3; <sup>R</sup>3*)*, *<sup>W</sup>*2*,*2*(*T3; <sup>R</sup><sup>3</sup> 3 *)* inner product. Of course ·*,* ·<sup>3</sup> is well defined on <sup>∞</sup> *<sup>k</sup>*=<sup>1</sup> span{*a*1*,...,ak*} and so we define *<sup>W</sup>*3*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup><sup>3</sup> 3 *)* as the completion of ∞ *<sup>k</sup>*=<sup>1</sup> span{*a*1*,...,ak*} in this inner product. We consider *<sup>W</sup>m,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* as a Hilbert Space equipped with the ·*,* ·*<sup>m</sup>* inner product, and define our solution to the equation (4) relative to these spaces.

#### *2.3 Notions of Solution and Results*

We frame this definition for an <sup>F</sup>0−measurable *<sup>u</sup>*<sup>0</sup> : *<sup>Ω</sup>* <sup>→</sup> *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*. Here and throughout we use the notation **1** for the indicator function.

**Definition 1** A pair *(u, τ )* where *<sup>τ</sup>* is a <sup>P</sup> <sup>−</sup> *a.s.* positive stopping time and *<sup>u</sup>* is a process such that for <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>*, *<sup>u</sup>*·*(ω)* <sup>∈</sup> *<sup>C</sup>* [0*, T* ]; *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* and *<sup>u</sup>*·*(ω)***1**·≤*τ (ω)* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> [0*, T* ]; *<sup>W</sup>*2*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* for all *T >* 0 with *u*·**1**·≤*<sup>τ</sup>* progressively measurable in *<sup>W</sup>*2*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*, is said to be a local strong solution of the equation (2) if the identity

$$\begin{aligned} \mu\_l - \mu\_0 + \int\_0^{t \wedge \tau} \mathcal{P} \mathcal{L}\_{u\_s} u\_s \, ds + \int\_0^{t \wedge \tau} A u\_s ds \\ - \frac{1}{2} \sum\_{l=1}^\infty \int\_0^{t \wedge \tau} \mathcal{P} B\_l^2 u\_s ds + \sum\_{l=1}^\infty \int\_0^{t \wedge \tau} \mathcal{P} B\_l u\_s dW\_s^l = 0 \end{aligned} \tag{6}$$

holds <sup>P</sup> <sup>−</sup> *a.s.* in *<sup>L</sup>*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* for all *<sup>t</sup>* <sup>≥</sup> 0.

We shall address why this definition makes sense in the abstract setting in Sect. 3.3, before then translating this abstract framework back to our Navier-Stokes Equation.

**Definition 2** A pair *(u, Θ)* such that there exists a sequence of stopping times *(θj )* which are <sup>P</sup> <sup>−</sup> *a.s.* monotone increasing and convergent to *<sup>Θ</sup>*, whereby *(u*·∧*θj , θj )* is a local strong solution of the equation (4) for each *j* , is said to be a maximal strong solution of the equation (4) if for any other pair *(v, Γ )* with this property then *<sup>Θ</sup>* <sup>≤</sup> *<sup>Γ</sup>* <sup>P</sup> <sup>−</sup> *a.s.* implies *<sup>Θ</sup>* <sup>=</sup> *<sup>Γ</sup>* <sup>P</sup> <sup>−</sup> *a.s.*

**Definition 3** A maximal strong solution *(u, Θ)* of the equation (4) is said to be unique if for any other such solution *(v, Γ )*, then *<sup>Θ</sup>* <sup>=</sup> *<sup>Γ</sup>* <sup>P</sup> <sup>−</sup> *a.s.* and for all *t* ∈ [0*,Θ)*,

$$\mathbb{P}\left(\{\omicron = \Omega \,:\, \mu\_{l}(\omega) = \upsilon\_{l}(\omega)\}\right) = 1.$$

We can now state the main result of the paper.

**Theorem 1** *For any given* <sup>F</sup>0<sup>−</sup> *measurable <sup>u</sup>*<sup>0</sup> : *<sup>Ω</sup>* <sup>→</sup> *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*), there exists a unique maximal strong solution (u, Θ) of the equation (4). Moreover at* <sup>P</sup> <sup>−</sup> *a.e. ω for which Θ(ω) <* ∞*, we have that*

$$\sup\_{r \in [0, \Theta(\omega))} \|u\_r(\omega)\|\_1^2 + \int\_0^{\Theta(\omega)} \|u\_r(\omega)\|\_2^2 dr = \infty. \tag{7}$$

#### **3 Abstract Framework and Results**

We now establish the abstract framework through which we arrive at Theorem 1. This involves giving two sets of assumptions before exploring the abstract method with the assumptions in place, and then in Sect. 4.2 discussing how (4) fits into this framework. These assumption sets pertain to two different notions of solution (both strong in the probabilistic sense but related to different spaces), the reason for which will be illustrated in Sect. 4. We give these as two distinct sets of assumptions in the event that an equation fits the first set of assumptions but not the second, such that we would still be able to conclude that some type of solution exists for the equation.

#### *3.1 Assumption Set 1*

We work with a quartet of continuously embedded Hilbert Spaces

$$V \hookrightarrow H \hookrightarrow U \hookrightarrow X$$

and the operators

$$\mathcal{A}: [0,\infty) \times V \to U,$$

$$\mathcal{G}: [0,\infty) \times V \to \mathcal{E}^{\rho^2}(\mathfrak{U}; H).$$

We ask that there is a continuous bilinear form ·*,* ·*X*×*<sup>H</sup>* : *<sup>X</sup>* <sup>×</sup> *<sup>H</sup>* <sup>→</sup> <sup>R</sup> such that for *φ* ∈ *U* and *ψ* ∈ *H*,

$$
\langle \phi, \psi \rangle\_{X \times H} = \langle \phi, \psi \rangle\_U. \tag{8}
$$

Moreover the continuity and bilinearity ensures that there exists some constant *c* whereby for all such *φ,ψ*,

$$\|\langle \phi, \psi \rangle\_{X \times H} \| \le c \|\phi\| \|\_{X} \|\psi\|\_{H}. \tag{9}$$

As we look to use a Galerkin Scheme to solve our equation, we introduce now a sequence of spaces *(Vn)* contained in *V* given by *Vn* := span {*a*1*,...,an*} for *(an)* an orthogonal basis in *U*. Defining P*<sup>n</sup>* to be the orthogonal projection onto *Vn* in *X*, we shall also assume that the restriction of P*<sup>n</sup>* to *U* is an orthogonal projection in *U* and that the sequence of these projections is uniformly bounded on *H*: that is, that there exists some constant *c* independent of *n* such that for all *φ* ∈ *H*,

$$\|\mathcal{P}\_n\Phi\|\_{H}^2 \le c \|\Phi\|\_{H}^2. \tag{10}$$

We also require the existence of a real valued sequence *(μn)* with *μn* → ∞, which is such that for any *φ* ∈ *U* and *ψ* ∈ *H*,

$$\|(I - \mathcal{P}\_n)\phi\|\_X \le \frac{1}{\mu\_n} \|\phi\|\_U,\tag{11}$$

$$\|(I - \mathcal{P}\_n)\psi\|\_U \le \frac{1}{\mu\_n} \|\psi\|\_H \tag{12}$$

where *I* represents the identity operator in the corresponding spaces. These assumptions are of course supplemented by a series of assumptions on the operators. We shall use general notation *ct* to represent a function *<sup>c</sup>*· : [0*,*∞*)* <sup>→</sup> <sup>R</sup> bounded on [0*, T* ] for any *T >* 0, evaluated at the time *t*. Moreover we define functions *K*, *K*˜ relative to some non-negative constants *p, p, q,* ˜ *q*˜. We use a generic notation to define the functions *<sup>K</sup>* : *<sup>U</sup>* <sup>→</sup> <sup>R</sup>, *<sup>K</sup>* : *<sup>U</sup>* <sup>×</sup> *<sup>U</sup>* <sup>→</sup> <sup>R</sup>, *<sup>K</sup>*˜ : *<sup>H</sup>* <sup>→</sup> <sup>R</sup> and *<sup>K</sup>*˜ : *<sup>H</sup>* <sup>×</sup> *<sup>H</sup>* <sup>→</sup> <sup>R</sup> by

$$K(\phi) := 1 + \|\phi\|\_{U}^{p},$$

$$K(\phi, \psi) := 1 + \|\phi\|\_{U}^{p} + \|\psi\|\_{U}^{q},$$

$$\tilde{K}(\phi) := K(\phi) + \|\phi\|\_{H}^{\bar{p}},$$

$$\tilde{K}(\phi, \psi) := K(\phi, \psi) + \|\phi\|\_{H}^{\bar{p}} + \|\psi\|\_{H}^{\bar{q}}$$

Distinct use of the function *K* will depend on different constants but in no meaningful way in our applications, hence no explicit reference to them shall be made. In the case of *K*˜ , when *p,*˜ *q*˜ = 2 then we shall denote the general *K*˜ by *K*˜ 2. In this case no further assumptions are made on the *p, q*. That is, *K*˜ <sup>2</sup> has the general representation

$$\tilde{K}\_2(\phi, \psi) = K(\phi, \psi) + \left\|\phi\right\|\_{H}^{2} + \left\|\psi\right\|\_{H}^{2} \tag{13}$$

and similarly as a function of one variable.

We state the assumptions for arbitrary elements *φ,ψ* <sup>∈</sup> *<sup>V</sup>* , *<sup>φ</sup><sup>n</sup>* <sup>∈</sup> *Vn* and *<sup>t</sup>* <sup>∈</sup> [0*,*∞*)*, and a fixed *κ >* 0. Understanding G as an operator G : [0*,*∞*)*×*V* ×U → *H*, we introduce the notation G*i(*·*,* ·*)* := G*(*·*,* ·*, ei)*.

**Assumption 1** *For any T >* 0*,* A : [0*, T* ] × *V* → *U and* G : [0*, T* ] × *V* → <sup>L</sup> <sup>2</sup>*(*U; *H ) are measurable.*

*Remark 1* Measurability here and throughout the paper is defined with respect to the Borel Sigma Algebra on the relevant Hilbert Spaces.

#### **Assumption 2**

$$\|\mathcal{A}(t,\Phi)\|\_{U}^{2} + \sum\_{l=1}^{\infty} \|\mathcal{G}\_{l}(t,\Phi)\|\_{H}^{2} \le c\_{l}K(\Phi)\left[1 + \|\Phi\|\_{V}^{2}\right],\tag{14}$$

$$\|\mathcal{A}(t,\phi) - \mathcal{A}(t,\psi)\|\_{X} \le c\_l \left[K(\phi,\psi) + \|\phi\|\_{V} + \|\psi\|\_{V}\right] \|\phi - \psi\|\_{H},\tag{15}$$

$$\sum\_{l=1}^{\infty} \|\mathcal{G}\_l(t, \phi) - \mathcal{G}\_l(t, \psi)\|\_{X} \le c\_l K(\phi, \psi) \|\phi - \psi\|\_{H} \tag{16}$$

#### **Assumption 3**

$$2\langle \mathcal{P}\_n \mathcal{A}(t, \boldsymbol{\Phi}^n), \boldsymbol{\Phi}^n \rangle\_H + \sum\_{l=1}^{\infty} \|\mathcal{P}\_n \mathcal{G}\_l(t, \boldsymbol{\Phi}^n)\|\_H^2 \le$$

$$c\_1 \tilde{K}\_2(\boldsymbol{\Phi}^n) \left[1 + \|\boldsymbol{\Phi}^n\|\_H^2\right] - \kappa \|\boldsymbol{\Phi}^n\|\_V^2,\tag{17}$$

$$\sum\_{l=1}^{\infty} \langle \mathcal{P}\_n \mathcal{G}\_l(t, \boldsymbol{\phi}^n), \boldsymbol{\phi}^n \rangle\_H^2 \le c\_l \tilde{K}\_2(\boldsymbol{\phi}^n) \left[ 1 + \|\boldsymbol{\phi}^n\|\_H^4 \right]. \tag{18}$$

#### **Assumption 4**

$$\begin{split} 2\langle \mathcal{A}(t,\boldsymbol{\Phi}) - \mathcal{A}(t,\boldsymbol{\Psi}), \boldsymbol{\Phi} - \boldsymbol{\Psi} \rangle\_{U} &+ \sum\_{l=1}^{\infty} \| \mathcal{G}\_{l}(t,\boldsymbol{\Phi}) - \mathcal{G}\_{l}(t,\boldsymbol{\Psi}) \|\_{U}^{2} \\ &\leq c\_{l} \tilde{K}\_{2}(\boldsymbol{\Phi},\boldsymbol{\Psi}) \| \boldsymbol{\Phi} - \boldsymbol{\Psi} \|\_{U}^{2} - \kappa \| \boldsymbol{\Phi} - \boldsymbol{\Psi} \|\_{H}^{2}, \end{split} \tag{19}$$

$$\sum\_{l=1}^{\infty} \langle \mathcal{G}\_l(t, \phi) - \mathcal{G}\_l(t, \psi), \phi - \psi \rangle\_U^2 \le c\_l \tilde{K}\_2(\phi, \psi) \|\phi - \psi\|\_U^4 \tag{20}$$

#### **Assumption 5**

$$2\langle \mathcal{A}(t,\phi),\phi\rangle\_U + \sum\_{l=1}^{\infty} \|\mathcal{G}\_l(t,\phi)\|\_U^2 \le c\_l K(\phi) \left[1 + \|\phi\|\_H^2\right],\tag{21}$$

$$\sum\_{l=1}^{\infty} \langle \mathcal{G}\_l(t, \phi), \Phi \rangle\_U^2 \le c\_I K(\phi) \left[ 1 + \|\phi\|\_H^4 \right]. \tag{22}$$

#### *3.2 Assumption Set 2*

These assumptions are only checked in addition to Assumption Set 1 and so take place in the same framework. We state the assumptions now for arbitrary elements *φ,ψ* ∈ *H* and *t* ∈ [0*,*∞*)*, and continue to use the *c, K, K,κ* ˜ notation of Assumption Set 1.

**Assumption 6** *For any T >* 0*,* A : [0*, T* ] × *H* → *X is measurable, and whenever Φ is a progressively measurable process in H we have that* G*(*·*, Φ*·*) is progressively measurable in* <sup>L</sup> <sup>2</sup>*(*U; *U ).*

#### **Assumption 7**

$$\|\mathcal{A}(t,\boldsymbol{\Phi})\|\_{X}^{2} + \sum\_{l=1}^{\infty} \|\mathcal{G}\_{l}(t,\boldsymbol{\Phi})\|\_{U}^{2} \le c\_{l}K(\boldsymbol{\Phi})\left[1 + \|\boldsymbol{\Phi}\|\_{H}^{2}\right],\tag{23}$$

$$\|\mathcal{A}(t,\boldsymbol{\Phi}) - \mathcal{A}(t,\boldsymbol{\Psi})\|\_{X} \le c\_{l}\left[K(\boldsymbol{\phi},\boldsymbol{\Psi}) + \|\boldsymbol{\Phi}\|\_{H} + \|\boldsymbol{\Psi}\|\_{H}\right]\|\boldsymbol{\Phi} - \boldsymbol{\Psi}\|\_{H}\tag{34}$$

#### **Assumption 8**

$$2\left(\mathcal{A}(t,\boldsymbol{\phi})-\mathcal{A}(t,\boldsymbol{\Psi}),\boldsymbol{\phi}-\boldsymbol{\Psi}\right)\_{X}+\sum\_{l=1}^{\infty}\left\|\mathcal{G}\_{l}(t,\boldsymbol{\Phi})-\mathcal{G}\_{l}(t,\boldsymbol{\Psi})\right\|\_{X}^{2}\leq$$

$$c\_{l}\tilde{K}\_{2}(\boldsymbol{\Phi},\boldsymbol{\Psi})\left\|\boldsymbol{\phi}-\boldsymbol{\Psi}\right\|\_{X}^{2},\tag{25}$$

$$\sum\_{l=1}^{\infty}\left\langle\mathcal{G}\_{l}(t,\boldsymbol{\Phi})-\mathcal{G}\_{l}(t,\boldsymbol{\Psi}),\boldsymbol{\Phi}-\boldsymbol{\Psi}\right\rangle\_{X}^{2}\leq$$

$$c\_{l}\tilde{K}\_{2}(\boldsymbol{\Phi},\boldsymbol{\Psi})\left\|\boldsymbol{\Phi}-\boldsymbol{\Psi}\right\|\_{X}^{4}\tag{26}$$

We in fact state Assumption 9 for *φ* ∈ *V* and some *κ >* 0, making this a stronger assumption than 5.

**Assumption 9** *With the stricter requirement that φ* ∈ *V then*

$$\langle \mathcal{Q}(\mathcal{A}(t,\phi),\Phi) \rangle\_U + \sum\_{l=1}^{\infty} \|\mathcal{G}\_l(t,\phi)\|\_U^2 \le c\_l K(\phi) - \kappa \|\Phi\|\_H^2,\tag{27}$$

$$\sum\_{l=1}^{\infty} \langle \mathcal{G}\_l(t, \phi), \phi \rangle\_U^2 \le c\_l K(\phi). \tag{28}$$

#### *3.3 Notions of Solution and Results*

Here we define the two different notions of solution, which we call *V* -valued solutions and *H*-valued solutions. The corresponding definitions of uniqueness and maximality are given in one for both notions of solution. We frame the definition of the *V* -valued solutions for an initial condition *Ψ* <sup>0</sup> : *Ω* → *H* which is an F0 measurable mapping, and for the *H*-valued solutions a *Ψ* <sup>0</sup> : *Ω* → *U* which is likewise F0-measurable.

**Definition 4** A pair *(<sup>Ψ</sup> ,τ)* where *<sup>τ</sup>* is a <sup>P</sup> <sup>−</sup> *a.s.* positive stopping time and *<sup>Ψ</sup>* is a process such that for <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>*, *<sup>Ψ</sup>* ·*(ω)* <sup>∈</sup> *<sup>C</sup> (*[0*, T* ]; *H)* and *<sup>Ψ</sup>* ·*(ω)***1**·≤*τ (ω)* <sup>∈</sup>

(24)

*<sup>L</sup>*<sup>2</sup> *(*[0*, T* ]; *<sup>V</sup> )* for all *T >* 0 with *<sup>Ψ</sup>* ·**1**·≤*<sup>τ</sup>* progressively measurable in *<sup>V</sup>* , is said to be a *V* -valued local strong solution of the equation (2) if the identity

$$\Psi\_1 = \Psi\_0 + \int\_0^{t \wedge \tau} \mathcal{A}(s, \Psi\_s) ds + \int\_0^{t \wedge \tau} \mathcal{G}(s, \Psi\_s) d\mathcal{W}\_s \tag{29}$$

holds <sup>P</sup> <sup>−</sup> *a.s.* in *<sup>U</sup>* for all *<sup>t</sup>* <sup>≥</sup> 0.

*Remark 2* If *(Ψ ,τ)* is a *V* -valued local strong solution of the equation (2), then *Ψ* · = *Ψ* ·∧*<sup>τ</sup>* .

*Remark 3* The progressive measurability condition on *Ψ* ·**1**·≤*<sup>τ</sup>* may look a little suspect as *Ψ* <sup>0</sup> itself may only belong to *H* and not *V* making it impossible for *Ψ* ·**1**·≤*<sup>τ</sup>* to be even adapted in *V* . We are mildly abusing notation here; what we really ask is that there exists a process *Φ* which is progressively measurable in *V* and such that *Φ*· = *Ψ* ·**1**·≤*<sup>τ</sup>* almost surely over the product space *Ω* × [0*,*∞*)* with product measure <sup>P</sup> <sup>×</sup> *<sup>λ</sup>* for *<sup>λ</sup>* the Lebesgue measure on [0*,*∞*)*.

*Remark 4* If Assumption 1 and (14) hold, then the time integral is well defined in *U* and the stochastic integral is well defined as a local martingale in *H*.

**Definition 5** A pair *(<sup>Ψ</sup> ,τ)* where *<sup>τ</sup>* is a <sup>P</sup> <sup>−</sup> *a.s.* positive stopping time and *<sup>Ψ</sup>* is a process such that for <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>*, *<sup>Ψ</sup>* ·*(ω)* <sup>∈</sup> *<sup>C</sup> (*[0*, T* ]; *U)* and *<sup>Ψ</sup>* ·*(ω)***1**·≤*τ (ω)* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *(*[0*, T* ]; *H)* for all *T >* 0 with *<sup>Ψ</sup>* ·**1**·≤*<sup>τ</sup>* progressively measurable in *<sup>H</sup>*, is said to be an *H*-valued local strong solution of the equation (2) if the identity

$$\Psi\_t = \Psi\_0 + \int\_0^{t \wedge \tau} \mathcal{A}(s, \Psi\_s) ds + \int\_0^{t \wedge \tau} \mathcal{G}(s, \Psi\_s) d\mathcal{W}\_s \tag{30}$$

holds <sup>P</sup> <sup>−</sup> *a.s.* in *<sup>X</sup>* for all *<sup>t</sup>* <sup>≥</sup> 0.

*Remark 5* The analogy to Remarks 2, 3 hold for the *H*-valued solutions.

*Remark 6* If Assumption 6 and (23) hold, then the time integral is well defined in *X* and the stochastic integral is well defined as a local martingale in *U*.

In the following we use *V* ; *H* to mean *V* or *H* respectively.

**Definition 6** A pair *(Ψ ,Θ)* such that there exists a sequence of stopping times *(θj )* which are <sup>P</sup> <sup>−</sup> *a.s.* monotone increasing and convergent to *<sup>Θ</sup>*, whereby *(<sup>Ψ</sup>* ·∧*θj , θj )* is a *(V* ; *H )*−valued local strong solution of the equation (2) for each *j* , is said to be a *(V* ; *H )*−valued maximal strong solution of the equation (2) if for any other pair *(Φ,Γ)* with this property then *<sup>Θ</sup>* <sup>≤</sup> *<sup>Γ</sup>* <sup>P</sup> <sup>−</sup> *a.s.* implies *<sup>Θ</sup>* <sup>=</sup> *<sup>Γ</sup>* <sup>P</sup> <sup>−</sup> *a.s.*

**Definition 7** A *(V* ; *H )*-valued maximal strong solution *(Ψ ,Θ)* of the equation (2) is said to be unique if for any other such solution *(Φ,Γ)*, then *<sup>Θ</sup>* <sup>=</sup> *<sup>Γ</sup>* <sup>P</sup> <sup>−</sup> *a.s.* and for all *t* ∈ [0*,Θ)*,

$$\mathbb{P}\left(\{\omega \in \mathcal{Q} : \Psi\_I(\omega) = \Phi\_I(\omega)\}\right) = 1.$$

**Theorem 2** *Suppose that Assumption Set 1 holds. Then for any given* F0 *measurable Ψ* <sup>0</sup> : *Ω* → *H, there exists a unique V -valued maximal strong solution (<sup>Ψ</sup> ,Θ) of the equation (2). Moreover at* <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup> for which Θ(ω) <* <sup>∞</sup>*, we have that*

$$\sup\_{r \in [0, \Theta(\omega))} \left\| \Psi\_r(\omega) \right\|\_H^2 + \int\_0^{\Theta(\omega)} \left\| \Psi\_r(\omega) \right\|\_V^2 dr = \infty. \tag{31}$$

**Theorem 3** *Suppose that Assumption Set 1 and 2 hold. Then for any given* F0 *measurable Ψ* <sup>0</sup> : *Ω* → *U, there exists a unique H-valued maximal strong solution (<sup>Ψ</sup> ,Θ) of the equation (2). Moreover at* <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup> for which Θ(ω) <* <sup>∞</sup>*, we have that*

$$\sup\_{r \in [0, \Theta(\omega))} \left\| \Psi\_r(\omega) \right\|\_U^2 + \int\_0^{\Theta(\omega)} \left\| \Psi\_r(\omega) \right\|\_H^2 dr = \infty. \tag{32}$$

#### **4 Abstract Solution Method and Application**

In this final section we give the main steps of the proofs of Theorems 2 and 3, followed by a brief exposition of how our SALT Navier-Stokes Equation fits into this framework.

#### *4.1 Abstract Solution Method*

*Proof (Theorem 2)* We suppose that Assumption Set 1 holds and address the question first for an initial condition *<sup>Ψ</sup>* <sup>0</sup> which is such that for <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>*,

$$\left\|\Psi\_0(\alpha)\right\|\_{H}^{2} \leq M'\tag{33}$$

for some constant *M* . We work with this bounded initial condition in the first instance as we shall use local solutions up to first hitting times given in terms of the initial condition, so this boundedness translates to boundedness of the relevant process up until these times. As directed in Sect. 3.1 we are to use a Galerkin Scheme, whereby we consider the equations

$$
\Psi\_t^n = \Psi\_0^n + \int\_0^t \mathcal{P}\_n \mathcal{A}(\mathbf{s}, \Psi\_s^n) d\mathbf{s} + \int\_0^t \mathcal{P}\_n \mathcal{G}(\mathbf{s}, \Psi\_s^n) d\mathcal{W}\_s \tag{34}
$$

with notation P*n*G*(*·*,* ·*, ei)* := P*n*G*i(*·*,* ·*).* A local strong solution of this equation is defined as a pair *(<sup>Ψ</sup> n,τ)* where *<sup>τ</sup>* is a <sup>P</sup> <sup>−</sup> *a.s.* positive stopping time and *<sup>Ψ</sup> <sup>n</sup>* is an adapted process in *Vn* such that for <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>*, *<sup>Ψ</sup> <sup>n</sup>* · *(ω)* <sup>∈</sup> *<sup>C</sup> (*[0*, T* ]; *Vn)* for all *T >* 0, and the identity

$$\Psi\_t^n = \Psi\_0^n + \int\_0^{t \wedge \tau} \mathcal{P}\_n \mathcal{A}(s, \Psi\_s^n) ds + \int\_0^{t \wedge \tau} \mathcal{P}\_n \mathcal{G}(s, \Psi\_s^n) d\mathcal{W}\_s \tag{35}$$

holds <sup>P</sup> <sup>−</sup> *a.s.* in *Vn* for all *<sup>t</sup>* <sup>≥</sup> 0. We can conclude that for any fixed *t >* 0 and *M >* 1, a local strong solution *(Ψ n, τ M,t <sup>n</sup> )* of (34) exists for the stopping time *τ M,t n* defined by

$$\tau\_n^{M,t} := t \wedge \inf \left\{ s \ge 0 : \sup\_{r \in [0,s]} \| \Psi\_r^n \|\_U^2 + \int\_0^s \| \Psi\_r^n \|\_H^2 dr \ge M + \| \Psi\_0^n \|\_U^2 \right\}. \tag{36}$$

This conclusion is reached thanks to Assumption 2, through standard theory in the finite dimensional Hilbert Space *Vn* though some care must be taken for the infinite dimensional Brownian Motion. Understanding that

$$\|\Psi\_0^n(\omega)\|\_{H}^2 \le c \|\Psi\_0(\omega)\|\_{H}^2 \le cM' \tag{37}$$

coming from (10) and (33), it is clear that

$$\|\Psi\_0^n(\alpha)\|\_U^2 \le \tilde{M} \tag{38}$$

for some *M*˜ clearly still independent of *n* and *ω*. Thus we see the bound

$$\sup\_{r \in \{0, \tau\_n^{M,l}(\omega)\}} \left\| \Psi\_r^n(\omega) \right\|\_U^2 + \int\_0^{\tau\_n^{M,l}(\omega)} \left\| \Psi\_s^n(\omega) \right\|\_H^2 ds \le M + \tilde{M} \tag{39}$$

holds true for every *<sup>n</sup>* and <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>*. This boundedness plays a significant role in our analysis and demonstrates the importance of starting from this bounded initial condition in the first instance. The motivation for choosing these stopping times comes from the work of Glatt-Holtz and Ziane in the referenced paper [14]. The authors prove an abstract result which is the central theorem of the paper, which we simply restate in the Appendix as Theorem 4. In the original paper, the authors use the traditional Galerkin Scheme for Navier-Stokes (given by the basis of eigenfunctions of the Stokes Operator) and apply this theorem directly with the spaces <sup>H</sup><sup>1</sup> := *<sup>W</sup>*2*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*, <sup>H</sup><sup>2</sup> := *<sup>W</sup>*1*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*. We have to take a slight detour from this method in the case of transport noise due to the condition (47). Translating this to our framework through H<sup>1</sup> = *H* and H<sup>2</sup> = *U*, the idea in showing this condition is to apply the Itô Formula in *<sup>U</sup>* to the difference process *<sup>Ψ</sup> <sup>n</sup>* <sup>−</sup>*<sup>Ψ</sup> <sup>m</sup>*. When we simplify down the term arising from the quadratic variation of the stochastic integral, we must control

$$\sum\_{i=1}^{\infty} \| [I - \mathcal{P}\_m] \mathcal{G}\_i(s, \Psi\_s^m) \|\_{U}^2$$

which we would do via (12) and (10) to bound the above by

$$\sum\_{i=1}^{\infty} \frac{1}{\mu\_m} \|\mathcal{G}\_i(s, \Psi\_s^m)\|\_{H^s}^2$$

In order to send this to zero as *m* → ∞ we use some uniform boundedness of the term <sup>∞</sup> *<sup>i</sup>*=<sup>1</sup> G*i(s,<sup>Ψ</sup> <sup>m</sup> <sup>s</sup> )*<sup>2</sup> *<sup>H</sup>* which in the case of a Lipschitz operator as in the original paper is immediate from (39). Where G*<sup>i</sup>* is a differential operator we must obtain uniform boundedness of the solutions *(Ψ n)* in a higher norm, hence the need for our space *V* (which in the context of our SALT Navier-Stokes, would then be *<sup>W</sup>*3*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)*). For this reason we must introduce another step to the proof, whereby we show that there exists constants *C,C*˜ dependent on *M,M , t* but independent of *n* such that for the local strong solution *(Ψ n, τ M,t <sup>n</sup> )* of (34),

$$\mathbb{E} \sup\_{r \in [0, \tau\_n^{M, t}]} \|\Psi\_r^n\|\_{H}^2 + \mathbb{E} \int\_0^{\tau\_n^{M, t}} \|\Psi\_s^n\|\_V^2 ds \le C \left[ \mathbb{E} \left( \|\Psi\_0^n\|\_H^2 \right) + 1 \right] \tag{40}$$

and in particular

$$\mathbb{E} \sup\_{r \in [0, \tau\_n^{M, t}]} \|\Psi\_r^n\|\_H^2 + \mathbb{E} \int\_0^{\tau\_n^{M, t}} \|\Psi\_s^n\|\_V^2 ds \le \tilde{C}.\tag{41}$$

This result is proven by considering *Vn* as a Hilbert Space with *H* inner product, applying the Itô Formula in this context and using Assumption 3. Equation (41) then follows from (40) due to (10) so we see the significance of starting from an initial condition bounded in *H* and not just *U* (or at least, square integrable in *H*). From Assumption 4, along with the requirement that each P*<sup>n</sup>* is an orthogonal projection in *X* and *U* and the conditions (8),(11),(12), we deduce that for any *m<n* and *λm* := min{*μm, μ*<sup>2</sup> *m*},

$$\begin{split} &2\langle\mathcal{P}\_{n}\mathcal{A}(t,\boldsymbol{\Phi})-\mathcal{P}\_{m}\mathcal{A}(t,\boldsymbol{\Psi}),\boldsymbol{\Phi}-\boldsymbol{\Psi}\rangle\_{U}+\sum\_{l=1}^{\infty}\left\|\mathcal{P}\_{n}\mathcal{G}\_{l}(t,\boldsymbol{\Phi})-\mathcal{P}\_{m}\mathcal{G}\_{l}(t,\boldsymbol{\Psi})\right\|\_{U}^{2} \\ &\leq c\_{I}\tilde{K}\_{2}(\boldsymbol{\Phi},\boldsymbol{\Psi})\|\boldsymbol{\Phi}-\boldsymbol{\Psi}\|\_{U}^{2}-\frac{\kappa}{2}\|\boldsymbol{\Phi}-\boldsymbol{\Psi}\|\_{H}^{2}+\frac{c\_{I}}{\lambda\_{m}}K\left(\boldsymbol{\Phi},\boldsymbol{\Psi}\right)\left[1+\left\|\boldsymbol{\Phi}\right\|\_{V}^{2}+\left\|\boldsymbol{\Psi}\right\|\_{V}^{2}\right]. \\ &\sum\_{l=1}^{\infty}\langle\mathcal{P}\_{n}\mathcal{G}\_{l}(t,\boldsymbol{\Phi})-\mathcal{P}\_{m}\mathcal{G}\_{l}(t,\boldsymbol{\Psi}),\boldsymbol{\Phi}-\boldsymbol{\Psi}\rangle\_{U}^{2} \\ &\leq c\_{I}\tilde{K}\_{2}(\boldsymbol{\Phi},\boldsymbol{\Psi})\|\boldsymbol{\Phi}-\boldsymbol{\Psi}\|\_{U}^{4}+\frac{c\_{I}}{\lambda\_{m}}K\left(\boldsymbol{\Phi},\boldsymbol{\Psi}\right)\left[1+\left\|\boldsymbol{\Psi}\right\|\_{V}^{2}\right]. \end{split}$$

Along with (41) these bounds allow us to conclude that

$$\begin{split} \lim\_{m \to \infty} \sup\_{n \ge m} \left[ \mathbb{E} \sup\_{r \in [0, r\_m^{M,l} \wedge \tau\_n^{M,l}]} \| \Psi\_r^n - \Psi\_r^m \|\_{U}^2 \right] \\ + \mathbb{E} \int\_0^{\tau\_m^{M,l} \wedge \tau\_n^{M,l}} \| \Psi\_s^n - \Psi\_s^m \|\_{H}^2 ds \right] = 0 \end{split} \tag{42}$$

again via an application of the Itô Formula for *Vn* considered as a Hilbert Space with *<sup>U</sup>* inner product, on the difference process *<sup>Ψ</sup> <sup>n</sup>* <sup>−</sup> *<sup>Ψ</sup> <sup>m</sup>*. With similar ideas and the Assumption 5, we infer that

$$\begin{split} \lim\_{S \to 0} \sup\_{n \in \mathbb{N}} \mathbb{P} \left( \left\{ \sup\_{r \in [0, \tau\_n^{M, \mathfrak{I}} \wedge S]} \| \Psi\_r^n \| \right\|\_{U}^2 \\ &+ \int\_0^{\tau\_n^{M, \mathfrak{I}} \wedge S} \| \Psi\_r^n \| \|\_{H}^2 dr \ge M - 1 + \| \Psi\_0^n \| \_U^2 \right\} = 0. \end{split} \tag{43}$$

We then apply Theorem 4 for H<sup>1</sup> = *H*, H<sup>2</sup> = *U* and claim that the resulting pair *(Ψ , τ M,t* <sup>∞</sup> *)* satisfies the additional properties that:


$$\mathbb{E}\left[\sup\_{r\in[0,\tau\_{\infty}^{M,t}]} \left\Vert \Psi\_r^{n\_l} - \Psi\_r \right\Vert\_U^2 + \int\_0^{\tau\_{\infty}^{M,t}} \left\Vert \Psi\_r^{n\_l} - \Psi\_r \right\Vert\_H^2 dr\right] \longrightarrow 0. \tag{44}$$

Indeed the first two are true from using the uniform boundedness (41) and taking weakly convergent subsequences in the appropriate spaces, then using uniqueness of limits in the weak topology and the embeddings *V* → *H* → *U* to identify this limit with *Ψ* . The weak convergence preserves the measurability and so the progressive measurability of each *Ψ <sup>n</sup>* (from the continuity and adaptedness in *Vn*) is what gives the result here. The final item is then a simple application of the dominated convergence theorem. To conclude that *(Ψ , τ M,t* <sup>∞</sup> *)* is a *V* -valued local strong solution it only remains to show the identity (29), which is done by taking limits of the corresponding terms in (35) and applying (15), (16) alongside the already used assumptions on the *(*P*n)*. We take the limit in *X* and argue that the identity being satisfied in *X* is sufficient to conclude the satisfaction of the identity in *U*, given that all integrals can be constructed in *U* from the regularity of the solution.

We have now shown the existence of a *V* −valued local strong solution but for the bounded initial condition (33). We then show a uniqueness result for such solutions, which is: suppose that *(<sup>Ψ</sup>* <sup>1</sup>*, τ*1*)* and *(<sup>Ψ</sup>* <sup>2</sup>*, τ*2*)* are two *<sup>V</sup>* <sup>−</sup>valued local strong solutions of the equation (2) for a given initial condition *Ψ* 0. Then for all *s* ∈ [0*,*∞*)*,

$$\mathbb{P}\left(\left|\omega\in\mathfrak{Q}:\Psi^{1}\_{\boldsymbol{s}\wedge\tau\_{1}(\boldsymbol{\omega})\wedge\tau\_{2}(\boldsymbol{\omega})}(\boldsymbol{\omega})=\Psi^{2}\_{\boldsymbol{s}\wedge\tau\_{1}(\boldsymbol{\omega})\wedge\tau\_{2}(\boldsymbol{\omega})}(\boldsymbol{\omega})\right|\right)=1.$$

This is proven through applying Assumption 4 in the context of an Itô Formula in *U* of the difference process of any two solutions. With this uniqueness in place we then conclude the results of Theorem 2 but still for the bounded initial condition, via similar arguments to those used in [14]. To pass to a general initial condition we consider a sequence of such maximal strong solutions *(Ψ k, Θk)* corresponding to the bounded initial conditions *(Ψ* <sup>0</sup>**1***k*≤*<sup>Ψ</sup>* <sup>0</sup>*<sup>H</sup>* <sup>≤</sup>*k*+1*)* and use the maximality on these pieces to show that the pair *(Ψ ,Θ)* defined at each time *t* ∈ [0*, T* ] and *ω* ∈ *Ω* by

$$\Psi\_I(\omega) := \sum\_{k=1}^{\infty} \Psi\_I^k(\omega) \mathbf{1}\_{k \le \|\Psi\_0(\omega)\|\_{H} < k+1}, \quad \Theta(\omega) := \sum\_{k=1}^{\infty} \Theta^k(\omega) \mathbf{1}\_{k \le \|\Psi\_0(\omega)\|\_{H} < k+1}$$

is our desired solution for the initial condition *Ψ* <sup>0</sup> (where the limit for *Ψ* is in reality just a finite sum). It is clear that for any *ω*, there exists a *k* such that *(Ψ (ω), Θ(ω))* = *(Ψ k(ω), Θk(ω))* so the property (31) follows from the same property in the case of the bounded initial condition. This rounds off our discussion for the proof of Theorem 2.

In the case where Assumption Set 2 holds, we then look to use the *V* -valued local strong solutions to obtain an *H*-valued local strong solution but now just for a *U*valued initial condition. At this juncture it is well worth addressing the question of why we consider these distinct types of solution; that is if we wanted an *H*-valued local strong solution then why not restate Assumption Set 1 for the spaces *V* as *H*, *H* as *U* and *U* as *X*? The reason lies in the application to our stochastic Navier-Stokes equation, which would then not satisfy the required assumption. This will be discussed more explicitly in Sect. 4.2.

*Proof (Theorem 3)* The idea now is to apply this existence result to the sequence of initial conditions *(*P*nΨ* <sup>0</sup>*)*, and apply the same Theorem 4 argument to the corresponding sequence of solutions. From here we now need to suppose that Assumption Set 2 holds in addition to Assumption Set 1. In the same manner we start again from a bounded *Ψ* 0, this time such that

$$\|\Psi\_0(\omega)\|\_U^2 \le \tilde{M}.\tag{45}$$

We could immediately apply Theorem 2 for each initial condition P*nΨ* 0, though we want to apply Theorem 4 for the same spaces H<sup>1</sup> = *H* and H<sup>2</sup> = *U*. Recall that we could not do this immediately for a *U*-valued initial condition and the sequence of Galerkin solutions due to gaining a suitable control on the noise term arising from the difference of the projections. In the present scenario we consider solutions to the unprojected (2) and so we are not burdened with this difficulty. An application of Theorem 4 would rely on us being able to conclude that each maximal solution *(<sup>Ψ</sup> n, Θn)* corresponding to the initial condition <sup>P</sup>*n<sup>Ψ</sup>* <sup>0</sup> exists up until the stopping time (36) (where the *Ψ <sup>n</sup>* notation has now shifted to the above). This is not immediate from Theorem 2, though we can use similar maximality arguments to extend these solutions to *τ M,t <sup>n</sup>* at the cost of some regularity. Indeed for these extended solutions we have only the regularity of the *H*-valued solution but with the additional benefit that *<sup>Ψ</sup> t(ω)***1**·≤*<sup>τ</sup> M,t <sup>n</sup> (ω)* <sup>∈</sup> *<sup>V</sup>* almost everywhere on the product space *Ω* × [0*,*∞*)*. This facilitates the use of Assumption 4 in order to show the Cauchy property (42), but only via first using an Itô Formula with the bilinear form ·*,* ·*X*×*<sup>H</sup>* . We must make this step as the identity for these extended solutions is only satisfied in *X* hence we cannot use the *U* inner product. The stochastic integral though can be constructed in *U* following from Remark 5, and the regularity *<sup>Ψ</sup> t(ω)***1**·≤*<sup>τ</sup> M,t <sup>n</sup> (ω)* <sup>∈</sup> *<sup>V</sup>* allows us to call upon the property (8) so that we can apply Assumption 4. Without the uniform boundedness (41) for these solutions we need Assumption 9 instead of just 5 to deduce (43). The conclusion of the proof of Theorem 3 then follows identically to that of 2, now using Assumption 8 for the uniqueness part and (24) to show the convergence of the time integral term when justifying that the limiting pair *(Ψ , τ M,t* <sup>∞</sup> *)* obtained from Theorem 4 is an *H*-valued local strong solution.

#### *4.2 SALT Navier-Stokes in the Abstract Framework*

We now briefly comment on the application of this abstract framework to Eq. (4) in order to conclude the paper. In the previous subsection we have already established the identification of the spaces

$$V := W\_{\sigma}^{3,2}(\mathbb{T}^3; \mathbb{R}^3), H := W\_{\sigma}^{2,2}(\mathbb{T}^3; \mathbb{R}^3), U := W\_{\sigma}^{1,2}(\mathbb{T}^3; \mathbb{R}^3), X := L\_{\sigma}^2(\mathbb{T}^3; \mathbb{R}^3)$$

at which point we address the question posed in that subsection as to why we need to make this effort with first the *V* −valued solutions before showing the existence for the *H*−valued ones. That is, why would Assumption Set 1 not hold if we were to shift the spaces from *V* to *H*, *H* to *U* and *U* to *X* (with some modifications of the reference to *X* in Assumption Set 1)? One clear answer is in the treatment of the nonlinear term for (17): for *<sup>H</sup>* <sup>=</sup> *<sup>W</sup>*2*,*<sup>2</sup> *<sup>σ</sup> (*T3; <sup>R</sup>3*)* we have the algebra property of the Sobolev Space which affords us a bound

$$\|\mathcal{L}\_{\phi^n}\phi^n\|\_{2} \le c \|\phi^n\|\_{2} \|\phi^n\|\_{3}$$

using the equivalence of the ·<sup>2</sup> and the standard *<sup>W</sup>*2*,*<sup>2</sup> one. In the *<sup>W</sup>*1*,*<sup>2</sup> norm we do not have the same luxury and so this nonlinear term cannot be bounded just in terms of the *W*1*,*<sup>2</sup> and *W*2*,*<sup>2</sup> norms as would be required. It is worth noting the significance of using the ·*,* ·<sup>2</sup> inner product here, as in the same assumption this facilitates the 'integration by parts' property for the Stokes Operator in order to gain the additional control we require (i.e. the <sup>−</sup>*κφn*<sup>2</sup> *<sup>V</sup>* term). There is some additional care required then to control the noise terms in these inner products, but this is facilitated by using the same standard cancellation argument that

$$
\langle \mathcal{L}\_{\xi\_l} \phi, \phi \rangle\_{L^2} = 0 \tag{46}
$$

for *<sup>φ</sup>* <sup>∈</sup> *<sup>W</sup>*1*,*2*(*T3; <sup>R</sup>3*)*, as well as appreciating that the commutator [*Δ,Bi*] is of second order and commuting through the *Bi* with *Δ* until we reduce to a term of the form (46). The control (3) allows the *ξi* to be effectively ignored in many of these computations, by just pulling them out with the supremum. We refer once more to [6] for the complete details. Of course it is Theorem 3 which is what translates into our main Theorem of the paper (1), though it is also worth noting that having showed Theorem 2 in this context then we can also say something about the retained regularity of our solutions coming from a more regular initial condition. To really make this point we'd have to say that the maximal times for the different notions of solution were in fact the same, and this is to be addressed in [6].

#### **Appendix**

Here we state [14, Lemma 5.1].

**Theorem 4** *Let* <sup>H</sup><sup>1</sup> <sup>⊂</sup> <sup>H</sup><sup>2</sup> *be Hilbert Spaces with continuous embedding, and (<sup>Ψ</sup> n) be a sequence of processes such that for* <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>, <sup>Ψ</sup> n(ω)* <sup>∈</sup> *<sup>C</sup> (*[0*, T* ]; <sup>H</sup>2*)* <sup>∩</sup> *<sup>L</sup>*<sup>2</sup> *(*[0*, T* ]; <sup>H</sup>1*) which is a Banach Space with norm*

$$\|\boldsymbol{\Psi}\|\_{X(T)} := \left(\sup\_{r\in\{0,T\}} \|\boldsymbol{\Psi}\_r\|\_{\mathcal{H}\_2}^2 + \int\_0^T \|\boldsymbol{\Psi}\_r\|\_{\mathcal{H}\_1}^2 dr\right)^{\frac{1}{2}}.$$

*For some fixed M >* 1 *and t >* 0 *define the stopping times*

$$\tau\_n^{M,t}(\omega) := t \wedge \inf \left\{ s \ge 0 : \left\| \Psi^n(\omega) \right\|\_{X(s)}^2 \ge M + \left\| \Psi\_0^n(\omega) \right\|\_{\mathcal{H}\_2}^2 \right\}$$

*and suppose that*

$$\lim\_{m \to \infty} \sup\_{n \ge m} \mathbb{E} \| \Psi^n - \Psi^m \|\_{X(\mathbf{r}\_m^{M,l} \wedge \mathbf{r}\_n^{M,l})}^2 = 0 \tag{47}$$

*and*

$$\lim\_{S \to 0} \sup\_{n \in \mathbb{N}} \mathbb{P} \left( \left\| \|\Psi^n\|\right\|\_{X(\mathfrak{r}\_n^{M,\mathfrak{t}} \wedge S)}^2 \ge M - 1 + \|\Psi\_0^n\|\_{\mathcal{H}\_2}^2 \right\} = 0.$$

*Then there exists a stopping time τ M,t* <sup>∞</sup> *, a subsequence (<sup>Ψ</sup> nl) and process <sup>Ψ</sup>* <sup>=</sup> *<sup>Ψ</sup>* ·∧*<sup>τ</sup> M,t* <sup>∞</sup> *such that:*

*–* P (<sup>0</sup> *< τ M,t* <sup>∞</sup> <sup>≤</sup> *<sup>τ</sup> M,t nl* ) <sup>=</sup> <sup>1</sup>*; – For* <sup>P</sup> <sup>−</sup> *a.e. <sup>ω</sup>, <sup>Ψ</sup> (ω)* <sup>∈</sup> *<sup>C</sup>* [0*, τ M,t* <sup>∞</sup> *(ω)*]; H<sup>2</sup> <sup>∩</sup> *<sup>L</sup>*<sup>2</sup> [0*, τ M,t* <sup>∞</sup> *(ω)*]; H<sup>1</sup> *;*

$$\begin{array}{l} \left( \begin{array}{l} \text{For } \mathbb{P}-a.e.\,\,\omega,\,\Psi^{n\_{l}}(\omega) \to \dot{\Psi}(\omega) \text{ in} \\ \left( \text{C} \left( [0,\,\tau^{M,t}\_{\infty}(\omega)];\,\mathcal{H}\_{2} \right) \cap L^{2} \left( [0,\,\tau^{M,t}\_{\infty}(\omega)];\,\mathcal{H}\_{1} \right) , \,\| \cdot\|\_{X(\tau^{M,t}\_{\infty}(\omega))} \right) . \end{array} \right) . \end{array}$$

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Coupling of Waves to Sea Surface Currents Via Horizontal Density Gradients**

**Darryl D. Holm, Ruiao Hu, and Oliver D. Street**

**Abstract** The mathematical models and numerical simulations reported here are motivated by satellite observations of horizontal gradients of sea surface temperature and salinity that are closely coordinated with the slowly varying envelope of the rapidly oscillating waves. This coordination of gradients of fluid material properties with wave envelopes tends to occur when strong horizontal buoyancy gradients are present. The nonlinear models of this coordinated movement presented here may provide future opportunities for the optimal design of satellite imagery that could simultaneously capture the dynamics of both waves and currents directly.

The model derived here appears in two levels of approximation: first for rapidly oscillating waves, and then for their slowly varying envelope (SVE) approximation obtained by using the WKB approach. The WKB wave-current-buoyancy interaction model derived here for a free surface with significant horizontal buoyancy gradients indicates that the mechanism for the emergence of these correlations is the ponderomotive force of the slowly varying envelope of rapidly oscillating waves acting on the surface currents via the horizontal buoyancy gradient. In this model, the buoyancy gradient appears explicitly in the WKB wave momentum, which in turn generates density-weighted potential vorticity whenever the buoyancy gradient is not aligned with the wave-envelope gradient.

**Keywords** Nonlinear water waves · Free surface fluid dynamics · Geometric mechanics

**Supplementary Information** The online version contains supplementary material available at https://doi.org/10.1007/978-3-031-18988-3\_8

D. D. Holm · R. Hu · O. D. Street (-)

Department of Mathematics, Imperial College London, London, UK e-mail: d.holm@imperial.ac.uk; ruiao.hu15@imperial.ac.uk; o.street18@imperial.ac.uk

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_8

#### **1 Introduction**

#### *1.1 Submesoscale Sea Surface Dynamics*

Capabilities in sea surface observation have been improving rapidly during the past two decades [1]. In particular, new high-resolution satellite observation capabilities are revealing sea surface features seen for the first time at *submesoscale* spatial scales of 100 m–10 km and time scales of hours to weeks. Invariably, the new satellite imagery reveals a plethora of coupled dynamical surface phenomena, including currents, spiral filaments, flotsam patterns, jets and fronts, some of which are detected indirectly through gradients of sea surface temperature, salinity or colour, in addition to the imagery [5, 10, 13, 20, 26].

The new capabilities in sea surface observation are still developing. For example, the impending Surface Water Ocean Topography (SWOT) mission will map the ocean surface mesoscale sea surface height field, as well as a large fraction of the associated submesoscale field, including buoyancy fronts [17]. A sample of this type of submesoscale data taken from [5] is shown in Figs. 1 and 2.

The coming new age of higher-resolution upper ocean observations will present a formidable array of challenges for the next generation in data management, computational simulation and mathematical modelling. This paper will offer a mathematical modelling framework that is flexible enough to admit uncertainty

**Fig. 1** Wave activity in the submesoscale ocean is dynamically complex, as illustrated in this figure showing the zoomed image of a submesoscale sea surface elevation, seen in Envisar MERIS glitter observations. This image shows the wave elevation tracking a cyclonic eddy visible in the sea surface glitter observations. The pixel resolution is 250 m. This glitter image demonstrates the complex, highly-coordinated dynamical forms taken in wave-current interaction on the submesoscale sea surface. In particular, notice the instabilities developing in the eddy's outer boundary. Image courtesy of B. Chapron

**Fig. 2** Comparison of the two images above demonstrates the emergent coherence between sea surface temperature and the glitter patterns visible from satellite imagery. The thermal fronts visible are dynamic, and sea surface roughness is most obvious along the strongest fronts. Discussions of the interpretation of sun glitter measurements are given in [5, 20, 26]. Images courtesy of B. Chapron

quantification through stochastic modelling and analysis, applied in concert with high-resolution observations, computational simulations, and stochastic data assimilation for large data sets. This framework involves decomposing the surface motion into a two-dimensional horizontal flow map representing transport by the current acting on a one-dimensional vertical flow map representing wave-like motion of the elevation. This composition-of-maps modelling framework is described and applied to model sea-surface dynamics in two deterministic examples in Sect. 2 of the present paper.

**Emergent Coherence (EC)** Combining high-resolution thermal data (buoyancy) with glitter data for the wave elevation as in Fig. 2 has recently revealed yet another interesting feature of submesoscale dynamics. Namely, the observed submesoscale data show extremely high correlations of wave, current and thermal properties [5]. This emergent spatial-temporal coherence of dynamic and thermal properties presents a significant challenge for dynamical submesoscale modelling. Accepting this challenge, the aim of this paper is to derive a mathematical model of nonlinear sea surface dynamics whose solutions also demonstrate the emergent coherence observed in combining different types of submesoscale data. This paper derives new *two-dimensional* equations that show the emergent coherence (EC) seen in the sea surface features appearing in Fig. 2. The EC behaviour produced by the equations derived here are demonstrated in Fig. 3 which shows a snapshot of the coherence of buoyancy and wave amplitude distributions in the dynamics of divergence-free two-dimensional flow acting on free surface vertical elevation wave features moving under gravity. In the model equations, the horizontal buoyancy gradients mediate the interactions between the vertical elevation waves and the horizontal currents. The equations of motion represent the current as a time-dependent, area-preserving map of the horizontal plane into itself and the waves as the composition of the horizontal flow map with a time-dependent vertical elevation map. Thus, the model involves a dynamical composition of maps (C◦M).

#### **2 Submesoscale Thermal Wave-Current Dynamics on a Free Surface**

#### *2.1 Surface Waves as Symmetry-Breaking Features of Local Force Imbalances*

Waves are propagating symmetry-breaking features that signify the response to a local imbalance of forces. Thus, from the viewpoint of satellite oceanography, observations of waves—defined as propagating sea surface elevation features signify processes at the surface or below the surface whose presence introduces forces that locally break the symmetry of the surface. The sea surface would otherwise follow the stable global gravitational balance of the geoid, which we regard here as being spherical. Thus, waves arise from a spatially local imbalance of forces in the neighbourhood of a stable equilibrium. The propagating feature of relevance here is the wave elevation, measured as the local departure of the surface level in the direction normal to its equilibrium mean level. The symmetry broken here is the invariance of the sea surface under spatial translations tangent to the equilibrium surface level, also known as the local *horizontal* direction. Hence, from the viewpoint of satellite oceanography, sea surface waves are observed as local vertical displacements of the otherwise horizontal motion of the ocean currents on

**Fig. 3** This is a 5122 snapshot of the C◦M equations in the SVE approximation in the potential vorticity form in (45). The four panels display the following distributions, modified potential vorticity Q-PV in (43) (top left), buoyancy (top right), square of the wave amplitude (bottom left) and wave phase (bottom right) in the numerical simulation of the dynamics of divergence-free flow on a free surface moving under gravity. The simulation began with a spin-up period with zero wave amplitude. After the spin-up period, as explained in Sect. 3, a checker-board pattern of finite wave amplitude with *zero phase* was introduced and the simulation was resumed. The 'mixing' of these wave patterns eventually brought them into coherence with the spatial distributions of thermal properties and potential vorticity. These features show an emergent coherence in patterns similar to those seen in the corresponding high-resolution satellite data in Fig. 2

the sea surface. From the mathematical modelling viewpoint, sea surface waves are local vertical oscillations of the horizontal surface that are carried along by the horizontal current flow, envisaged as a smooth invertible time-dependent map of the horizontal surface into itself. This is the composition of maps (C◦M) modelling approach for describing the dynamics of horizontal fluid flows (currents) acting on oscillating vertical elevations (waves). Since the surface current velocity, its advected material properties and the wave elevation are all that can be observed in satellite oceanography, the task in three-dimensional ocean modelling for satellite oceanography devolves into determining the dynamical surface features that are produced by the three-dimensional flow processes below the surface arising from e.g., bathymetry, stratification, rotation, Langmuir circulation, and thermal effects such as frontogenesis. The dynamics of the surface signatures of these threedimensional flow processes, as well as the effects of air-sea interactions on the surface, need to be interpreted in order to understand what satellite oceanography observes.

#### *2.2 A Tale of Two Maps: Currents and Waves*

**Story Line** Waves on the surface of the ocean are modelled here as a composition of two smooth invertible maps describing the temporal evolution and advection of two degrees of dynamical freedom interacting at widely separated space-time scales. In this composition of maps (C◦M) approach, the waves are regarded as local vertical disturbances that rapidly oscillate as they are swept along by the broad, slowly changing horizontal currents. Thus, the slow current motion is a Lagrangian coordinate for the rapid wave oscillations. This wide separation in space-time scales invokes the classical WKB description. The standard WKB approach seeks a rapidly oscillating wave packet solution whose phase-averaged amplitude possesses a slowly varying envelope (SVE) spatially. The WKB method is often applied via a variational principle because in a variational setting the phase average naturally leads to an adiabatic invariant known as the wave action density, cf. for example, [3] for a review of the WKB or SVE method in fluid dynamics. Here we will follow the variational approach of [4, 11] guided by the classical work of [22, 24, 25].

**Submesoscale Sea-Surface Motion: Composition of Two Time-Dependent Maps** The position and velocity of fluid parcels in motion under gravity on a 2D free surface embedded in **R**<sup>3</sup> have both horizontal and vertical components. The corresponding flow maps are denoted as the map *φt* : **<sup>R</sup>**<sup>2</sup> <sup>→</sup> **<sup>R</sup>**<sup>2</sup> for the horizontal current flow, and as the composite map *ζtφt* for the vertical elevation of the waves as a function of time and position in **R**2. The flow lines of these two components of the flow map of a free surface can be written as

$$\mathbf{r}\_{l} = \phi\_{l}\mathbf{r}\_{0} \quad \text{and} \quad z\_{l} = \zeta\_{l}(\phi\_{l}\mathbf{r}\_{0}) =: \zeta\_{l}(\mathbf{r}\_{l})\,,$$

where *<sup>r</sup><sup>t</sup>* <sup>=</sup> *(xt, yt)* <sup>∈</sup> **<sup>R</sup>**<sup>2</sup> is the horizontal position along the flow at time *<sup>t</sup>* and *ζt(rt)* is the vertical elevation at horizontal position *r<sup>t</sup>* at time *t*, starting at position *r*<sup>0</sup> at time *t* = 0. Thus, one may say that the initial position of the flow line, *r*0, is a Lagrangian coordinate for the horizontal motion, and the horizontal motion is a Lagrangian coordinate for the vertical motion. That is, the 'footpoint' at time *t* of the vertical component of the flow map *ζt* is located in the horizontal plane along a curve *φt r*<sup>0</sup> parameterised by time *t*. Likewise, one can simply say that the wave dynamics is advected, or swept along, by the current dynamics.

Hence, the corresponding horizontal and vertical components of velocity along a stream line *r<sup>t</sup>* in the horizontal plane are defined by,

$$\begin{aligned} \label{eq:Wilson\_I} \text{In the horizontal plane are defined by,} \\ \frac{d\mathbf{r}\_I}{dt} = \frac{d}{dt}(\phi\_l \mathbf{r}\_0) = \widehat{\mathbf{v}}\_l(\phi\_l \mathbf{r}\_0) =: \widehat{\mathbf{v}}\_l(\mathbf{r}\_I) \; , \quad \text{so} \quad \widehat{\mathbf{v}}\_l = \frac{d\phi\_l}{dt} \phi\_l^{-1} \quad \text{and} \end{aligned}$$

$$\frac{d\boldsymbol{z}\_{l}}{dt} = \boldsymbol{\widehat{w}}\_{l}(\mathbf{r}\_{l}) = \frac{d}{dt}(\boldsymbol{\xi}\_{l}(\boldsymbol{\phi}\_{l}\mathbf{r}\_{0})) = \partial\_{t}\boldsymbol{\xi}\_{l}(\mathbf{r}\_{l}) + \nabla\_{\mathbf{r}}\boldsymbol{\xi}\_{l}(\mathbf{r}\_{l}) \cdot \boldsymbol{\widehat{w}}\_{l}(\mathbf{r}\_{l}) \dots$$

That is, in the dynamics of free surface flow, the vertical velocity *w( <sup>r</sup>,t)* at a given Eulerian point *r* and time *t* is related to the wave elevation *ζ (r,t)* and horizontal velocity *v(r,t)* at that point by *w( <sup>r</sup>,t)* <sup>=</sup> *∂t ζ (r,t)* <sup>+</sup> **-**

$$
\widehat{w}(\mathbf{r},t) = \partial\_t \zeta(\mathbf{r},t) + \widehat{v}(\mathbf{r},t) \cdot \nabla\_{\mathbf{r}} \zeta(\mathbf{r},t) \,.
$$

In terms of these fluid variables, one could propose a Hamilton's principle for wavecurrent interaction of a free surface by following [8] for the variational modelling framework and applying [24, 7] for the potential energy to find1

$$\begin{aligned} & \text{Minewok all applying (2.4), } \iota \text{ you use powerman energy to find} \\ & \begin{aligned} & 0 = \delta S = \delta \int\_a^b \ell(\mathfrak{P}, \xi, D, \rho) \, dt \\ & \qquad = \delta \int\_a^b \int\_{\mathcal{D}} \left( \frac{1}{2} \left( |\mathfrak{P}|^2 + \sigma^2 (\mathfrak{d}\_t \xi + \nabla\_\mathbf{r} \xi \cdot \mathfrak{d})^2 \right) - \frac{\xi^2}{2Fr^2} \right) D\rho - p(D-1) \, d^2r \, dt. \end{aligned} \tag{11}$$

To interpret the variational principle proposed in (1) we rewrite its Lagrangian as a sum of an Eulerian spatial integral and an integral over material mass elements *<sup>d</sup>*2*r*<sup>0</sup> <sup>=</sup> *Dρ d*2*<sup>r</sup>* which follow the paths of the horizontal fluid motion, *<sup>r</sup>(r*0*,t)* <sup>=</sup> *φt r*0,

$$\begin{aligned} \label{eq:SDAC-1} \phi\_l r\_0, \\ 0 = \delta S = \delta \int\_a^b \int\_{\mathcal{D}} \frac{D\rho}{2} |\mathfrak{F}|^2 - p(D-1) \, d^2 r \, dt + \delta \int\_a^b \int\_{\mathcal{D}\_0} \frac{\sigma^2}{2} \dot{\xi}^2 - \frac{\xi^2}{2Fr^2} \, d^2 r\_0 \, dt \,. \end{aligned} \tag{2}$$

<sup>1</sup> In [8] the potential energy was linear in *ζ* . This linearity neglected the restoring force due to vertical pressure gradient via Archimedes' principle. Adopting the potential energy quadratic in *ζ* regains this restoring force.

Variations of the first summand in (2) at fixed spatial position *(r)* yield the Euler fluid equations for 2D divergence free flow with advected buoyancy, *ρ(r,t)* = *ρ(φt r*0*)* = *ρ*0*(r*0*)*,

$$
\widehat{\mathfrak{v}} = \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat{\mathfrak{v}} \oplus \widehat$$

Variations of the second summand in (2) taken at fixed mass element *(r*0*)* yield equations for vertical harmonic oscillations of the elevation of each material mass element

$$
\sigma^2 \ddot{\xi}(\mathbf{r}\_0, t) = \sigma^2 \frac{d^2 \xi}{dt^2} \Big|\_{\mathbf{r}\_0} = -\frac{\xi(\mathbf{r}\_0, t)}{Fr^2} \,. \tag{4}
$$

The wave-elevation equation in (4) is unrealistic, though, because it implies that fluid mass elements with different labels *(r*0*)* would be oscillating in phase and all with the same frequency, as they follow the flow of the Euler fluid equations (3) for 2D divergence free flow with advected buoyancy. This unrealistic synchronisation and resonance can be removed by including the inertia of each mass element. This can done by including the initial buoyancy of each mass element, as

$$
\sigma^2 \ddot{\xi}(\mathbf{r}\_0, t) = \sigma^2 \frac{d^2 \xi}{dt^2} \Big|\_{\mathbf{r}\_0} = -\frac{\rho\_{ref}}{\rho\_0(\mathbf{r}\_0)} \frac{\xi(\mathbf{r}\_0, t)}{Fr^2} \,. \tag{5}
$$

At this point in our reasoning, we have not yet considered the differences in space and time scales between the fluid flow and the wave activity. In what follows, we will use the simple composition-of-maps idea explained here along with estimates of relative space and time scales to investigate the applicability of this class of models. To improve the applicability of the model comprising (3) and (5) for describing the effects of currents on waves, we will derive a related model in the slowly varying envelope (SVE) approximation. The SVE approximation allows considerations of current and wave dynamics at the same space and time scales.

The comparisons of the simulated solutions of these C◦M models with the observations in Figs. 1, 2, 3, and 4 above indicate that these models can indeed produce results that match some aspects of observed features. However, these models are not derived from three dimensional fluid equations. Instead, they are derived from the simple solution Ansatz in Hamilton's principle that the vertical elevation of the sea surface wave activity is carried by divergence-free horizontal fluid motion. The latter assumption is a weakness of the current approach, because it precludes effects of vertical up-welling and down-welling, which are observed to occur along with convergence and divergence of currents [10]. The equations derived here are also not associated with classical surface wave equations such as the nonlinear Schroedinger (NLS) equation, or other celebrated surface wave equations. This departure from the classical water wave literature may be regarded as another weakness of the current approach.

**Fig. 4** These 5122 snapshots of the C◦M simulation in the vorticity form (25) shows the elevation *ζ* in the left panel and the density-weighted vertical velocity *w*' on the right. The snapshots are taken at the same time and with the same fluid spin-up initial conditions as the snapshots of the simulation of the SVE approximate equations presented in Fig. 3. Overlaying the two figures demonstrates that the resolved features in the *ζ* distribution in this figure of C◦M results are bounded by the SVE wave envelope distribution |*a*| <sup>2</sup> in Fig. 3

**Estimating Parameters** *σ*<sup>2</sup> **and** *F r*<sup>2</sup> **for Satellite Observations** The Lagrangian *!(**v, ζ,D, ρ)* in (1) represents the dimension-free difference of the kinetic and potential energies, augmented by the incompressibility constraint imposed by the Lagrange multiplier *p*. Two dimension-free parameters (*σ*<sup>2</sup> and *F r*2) appear in this Hamilton's principle. The coefficient *<sup>σ</sup>*<sup>2</sup> <sup>=</sup> *(*[*H*]*/*[*L*]*)*<sup>2</sup> in formula (1) is the square of the vertical-to-horizontal aspect ratio. Typically, for satellite observations of submesoscale dynamics one finds

$$\{H\} \approx (3 \times 10^{-4} - 3 \times 10^{-3}) \,\text{km} \quad \text{and} \quad \{L\} \approx (10^{-1} - 10) \,\text{km}, \quad \text{so} \quad \sigma^2 \approx 10^{-3} - 10^{-6} \ll 10$$

for the squared aspect ratio *<sup>σ</sup>*<sup>2</sup> \$ 1 of the height of the waves [*H*] relative to the breadth [*L*] of the two-dimensional domain. The squared 'Froude number' *F r*<sup>2</sup> in this regime is estimated by the square of the ratio of horizontal and vertical frequency scales at the sea surface,

$$\left|Fr^2\right| := \left(\frac{[V]/[H]}{N}\right)^2 \approx 1 - 10^4. \tag{6}$$

Here, the horizontal velocity on the sea surface is taken as [*V* ] = *(*0*.*1 − 1*)* m/s, [*H*] = *(*0*.*3 − 3*)* m. According to [9], the Brunt-Väisälä buoyancy frequency in the sea surface wave regime is given by *<sup>N</sup>* <sup>≈</sup> *(*10−<sup>3</sup> <sup>−</sup> <sup>10</sup>−4*)*/s. The ratio of horizontal and vertical frequency scales at the sea surface in (6) is selected for use later in applying the slowly varying envelope (SVE) wave approximation in Sect. 2.4. Hence, we estimate that the squared product of the 'Froude number' and aspect ratio for satellite observations of the sea surface can reasonably be estimated over the range

$$
\sigma^2 F r^2 := \left(\frac{[V]}{N[L]}\right)^2 \approx 10^{-3} - 10. \tag{7}
$$

**Modelling the Dynamic Effects of Surface Density Variations** As mentioned earlier, the observed oscillations of sea surface waves are by no means simultaneous across the whole domain, although the observations show that they are indeed coordinated spatially with the buoyancy of the fluid. To correct this solution behaviour, the kinetic energy and potential energy need to be de-synchronised from the buoyancy.

The dynamic dependence of the wave kinetic energy on the density is physically required. However, to de-synchronise the wave oscillations we can introduce a constant reference density *ρref* into the wave potential energy, by writing

$$\frac{\xi^2}{Fr^2} \to \frac{\rho\_{ref}}{\rho} \frac{\xi^2}{Fr^2} \quad \text{with} \quad \frac{\rho\_{ref}}{\rho} \quad \text{of order} \quad O(\text{l}) \,. \tag{8}$$

The quantity *ρref* is a constant reference density, and the density ratio *(ρref /ρ)* = *O(*1*)*.

The density dependence imposed here is important in the dynamics that follows from Hamilton's principle. Substituting the relations in (8) into Hamilton's principle in Eq. (1) leads to the following dimension-free action integral,

$$\begin{aligned} \text{In Eq. (1) leads to the following dimension-free action integral,}\\ 0 = \delta S &= \delta \int\_a^b \ell(\widehat{\mathfrak{v}}, \xi, D, \rho) \, dt \\ &= \delta \int\_a^b \int\_{\mathcal{D}} \left( \frac{1}{2} \left( |\widehat{\mathfrak{v}}|^2 + \sigma^2 \left( \mathfrak{d}\_t \xi + \nabla\_\mathbf{r} \xi \cdot \widehat{\mathfrak{v}} \right)^2 \right) - \frac{\rho\_{\text{ref}}}{\rho} \frac{\xi^2}{2Fr^2} \right) D\rho - p(D-1) \, d^2r \, dt. \end{aligned} \tag{9}$$

The advected quantities *D(r, t)d*2*r* and *ρ(r,t)* evolve via push-forward by the horizontal flow map, *φt* . For example, *Dtd*<sup>2</sup>*rt* <sup>=</sup> *φt* <sup>∗</sup>*(D*0*d*2*r*0*)* and *ρt* <sup>=</sup> *φt* <sup>∗</sup>*ρref* denote, respectively, evolution of the determinant of the Lagrange to Euler map and of the local scalar value of the mass density. Conservation of mass is then expressed as the push-forward relation, *Dtρtd*<sup>2</sup>*rt* <sup>=</sup> *φt* <sup>∗</sup>*(D*0*ρref <sup>d</sup>*2*r*0*)*. The pressure *<sup>p</sup>* in (9) acts as a Lagrange multiplier to enforce conservation of area, so that *Dt* = 1 = *φt* <sup>∗</sup>*D*0, and the horizontal flow is incompressible, which implies that the horizontal velocity is divergence-free, i.e., div*r**v(r,t)* = 0. Taking variations of the action integral (9) yields the following set of equations,

$$\begin{aligned} \text{Sugt Waws wux kei yuas kuas muuas ya horizontalau Cauchy onauus} \\\\ \delta\widehat{\mathfrak{p}}: \quad \frac{\delta\ell}{\delta\widehat{\mathfrak{p}}} &= D\rho\left(\widehat{\mathfrak{p}}\cdot d\mathsf{r} + \sigma^{2}\widehat{\widehat{w}}\,d\xi\right) \otimes d^{2}r := D\rho V \cdot d\mathsf{r} \otimes d^{2}r, \\ \text{with} \quad \widehat{w} &= \partial\_{t}\xi + \widehat{\mathfrak{p}}\cdot\nabla\_{r}\xi \,, \\\\ \delta\xi &: \quad \partial\_{t}(\sigma^{2}D\rho\widehat{w}) + \text{div}\_{r}(\sigma^{2}D\rho\widehat{w}\widehat{\mathfrak{p}}) - D\frac{\xi\rho\_{ref}}{Fr^{2}} = 0, \\ \delta D: \quad \frac{\delta\ell}{\delta D} &= \frac{\rho}{2}(|\widehat{\mathfrak{p}}|^{2} + \sigma^{2}\widehat{w}^{2}) - \frac{\rho\_{ref}\xi^{2}}{2Fr^{2}} - p := \rho\widetilde{\mathfrak{p}}\overline{\rho} - \widetilde{p}, \\ \delta\rho: \quad \frac{\delta\ell}{\delta\rho} &= \frac{D}{2}(|\widehat{\mathfrak{p}}|^{2} + \sigma^{2}\widehat{w}^{2}) =: D\widetilde{\mathfrak{p}}\, , \quad \widetilde{p} := p + \frac{\rho\_{ref}\xi^{2}}{2Fr^{2}}, \\ \delta p: \quad D - 1 = 0 &\implies \text{div}\_{r}\mathfrak{p} = 0 \,. \end{aligned} \tag{10}$$

From their definitions as advected quantities, one also knows that *D* and *ρ* satisfy

$$\begin{aligned} \text{In their definitions as advected quantities, one also knows that } D \text{ and } \rho \text{ satisfy} \\ (\partial\_l + \mathcal{L}\_{\widehat{\mathfrak{v}}})(D \, d^2 r) = 0 &\Longrightarrow \partial\_l D + \text{div}\_r(D\widehat{\mathfrak{v}}) = 0 \quad \text{with} \quad D = 1 \,\,, \\ (\partial\_l + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \rho = 0 &\Longrightarrow \partial\_l \rho + \widehat{\mathfrak{v}} \cdot \nabla\_r \rho = 0 \,\,, \end{aligned} \tag{11}$$

where <sup>L</sup>*<sup>v</sup>* denotes the Lie derivative operation along the horizontal velocity vector field, *v*, which provides coordinate-free brevity in the notation.

**Theorem 1 (Kelvin-Noether Circulation Theorem)** *Use of the Euler-Poincaré (EP) theorem yields the following Kelvin circulation theorem* **-**

$$\frac{d}{dt}\oint\_{c(\mathfrak{V})} \left(\widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \sigma^2 \widehat{w} \, d\mathfrak{f}\right) = -\oint\_{c(\mathfrak{V})} \frac{1}{\rho} d\widetilde{p} \,. \tag{12}$$

*Proof* The Euler-Poincaré (EP) theorem in this case yields

$$(\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \frac{\delta \ell}{\delta \widehat{\mathfrak{v}}} = \frac{\delta \ell}{\delta D} \diamond D + \frac{\delta \ell}{\delta \rho} \diamond \rho := D \nabla\_{\mathbf{r}} \frac{\delta \ell}{\delta D} - \frac{\delta \ell}{\delta \rho} \nabla\_{\mathbf{r}} \rho. \tag{13}$$

Here the diamond *(* ' *)* operator is defined by

$$\left< \frac{\delta \ell}{\delta a} \diamond a \,, \, X \right>\_{\mathfrak{X}} =: \left< \frac{\delta \ell}{\delta a} \,, \, -\mathfrak{E}\_X a \right>\_{V} \,. \tag{14}$$

In addition, *<sup>X</sup>* <sup>∈</sup> <sup>X</sup> is a (smooth) vector field defined on **<sup>R</sup>**<sup>2</sup> and *<sup>a</sup>* <sup>∈</sup> *<sup>V</sup>* , a vector space of advected quantities, which are here the scalar function, *ρ*, and the areal density *D d*2*r*. Using the advection relations for *D* and *ρ* in (11) and the corresponding variational derivatives in (10) simplifies the EP equation in (14) to

$$(\partial\_l + \mathcal{L}\mathfrak{z})\left(\frac{1}{D\rho}\frac{\delta\ell}{\delta\tilde{\mathfrak{v}}}\right) = \frac{1}{\rho}\nabla\_{\mathbf{r}}\frac{\delta\ell}{\delta D} - \frac{1}{D\rho}\frac{\delta\ell}{\delta\rho}\nabla\_{\mathbf{r}}\rho \,. \tag{15}$$

$$\text{Equation (10) then yields}\quad (\partial\_l + \mathcal{L}\_{\widetilde{\mathfrak{v}}})\left(\widehat{\mathfrak{v}} \cdot d\mathbf{r} + \sigma^2 \widehat{w} \, d\xi\right) = -\rho^{-1}d\widetilde{\mathfrak{p}} + d\widetilde{w} \,. \tag{15}$$

Inserting the last relation into the following standard relation for the time derivative of a loop integral then completes the proof of Eq.(12) appearing in the statement of the theorem, **--**

$$\frac{d}{dt}\oint\_{c(\mathfrak{H})} \left(\widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \sigma^2 \widehat{w} \, d\xi \right) = \oint\_{c(\mathfrak{H})} (\partial\_t + \mathcal{L}\_{\mathfrak{H}}) \left(\widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \sigma^2 \widehat{w} \, d\xi \right) = \oint\_{c(\mathfrak{H})} -\rho^{-1} d\widetilde{p} + d\widetilde{w} \, \mathcal{L}\_{\mathfrak{H}} \tag{16}$$

Using the advection relations for *D* and *ρ* in (11) again and combining with the variational relations with respect to *<sup>ζ</sup>* in (10) simplifies the *<sup>w</sup>* and *<sup>ζ</sup>* equations, as follows.

$$\begin{aligned} \widehat{\mathcal{L}}(\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \widehat{\widehat{w}} &= (\partial\_t + \widehat{\mathfrak{v}} \cdot \nabla\_{\mathfrak{r}}) \widehat{w} = -\frac{\rho\_{ref}}{\sigma^2 F r^2 \rho} \xi \ , \\ (\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \xi &= (\partial\_t + \widehat{\mathfrak{v}} \cdot \nabla\_{\mathfrak{r}}) \xi = \widehat{w} \ . \end{aligned} \tag{17}$$

After deriving these equations, one may finally evaluate the constraint *D* = 1 imposed by the variation in pressure *p* to obtain further simplifications.

**Corollary 2 (Kelvin-Noether circulation Theorem for the Current)** *The Kelvin circulation theorem for the current alone is given by,* **-**|**-**

$$\frac{d}{dt}\oint\_{c(\widehat{\mathfrak{v}})} \widehat{\mathfrak{v}} \cdot d\mathbf{r} = -\oint\_{c(\widehat{\mathfrak{v}})} \frac{1}{\rho} dp - d\frac{|\widehat{\mathfrak{v}}|^2}{2} \,. \tag{18}$$

*Proof* Equation (18) follows by shifting the *wdζ* term in Eq. (38) to the right-hand side, as **-**

$$\begin{split} \frac{d}{dt} \oint\_{c(\tilde{\mathfrak{v}})} \widehat{\mathfrak{v}} \cdot d\mathfrak{r} &= - \oint\_{c(\tilde{\mathfrak{v}})} \frac{1}{\rho} d\tilde{p} + \sigma^2 (\partial\_t + \mathcal{L}\_{\tilde{\mathfrak{v}}}) (\widehat{\tilde{w}} \, d\boldsymbol{\xi}) - d\widetilde{w} \\ &= - \oint\_{c(\tilde{\mathfrak{v}})} \frac{1}{\rho} d\tilde{p} + \sigma^2 ((\partial\_t + \widetilde{\mathfrak{v}} \cdot \nabla\_{\boldsymbol{r}}) \widehat{w}) d\boldsymbol{\xi} + \sigma^2 \widehat{w} d\widehat{w} - d\widetilde{w} \\ &= - \oint\_{c(\tilde{\mathfrak{v}})} \frac{1}{\rho} d\widetilde{p} - \frac{\rho\_{ref}}{Fr^2 \rho} \xi d\xi + \sigma^2 \widehat{w} d\widehat{w} - d\widetilde{w} \\ &= - \oint\_{c(\tilde{\mathfrak{v}})} \frac{1}{\rho} dp - d\frac{|\widehat{\mathfrak{v}}|^2}{2} \\ &=: - \oint\_{c(\tilde{\mathfrak{v}})} \frac{1}{\rho} dp - d\frac{|\widehat{\mathfrak{v}}|^2}{2} . \end{split} \tag{19}$$

*Remark 1 (Separation of Wave and Current Circulation)* The decoupling of the Kelvin-Noether circulation theorem into its wave and current components, leading to the reduction of the current flow to the Euler result in Eq. (18), was also observed in [8]. This behaviour is consistent with the Charney-Drazin 'non-acceleration' theorem [6, 23]. Namely, in certain circumstances, wave activity does not create circulation in the mean current. A modification that allows exchange of circulation between wave (vertical) and current (horizontal) components of the flow was proposed in [8]. The instabilities observed around the edges of eddies in the satellite imagery shown in Fig. 1 suggests that a coupling of this sort may exist at high wave number.

*Remark 2* It is clear from Eqs.(38)–(18) that generation of circulation of the current by the dynamics in Eq. (15) requires non-zero ∇*rρ* × ∇*rp*. No current circulation is generated by wave variables in the case of constant buoyancy.

#### *2.3 Thermal Potential Vorticity (TPV) Dynamics on a Free Surface*

The momentum map arising from the variations in (10) is given by

$$
\mathbf{g} \text{ from the variations in (10) is given by}
$$

$$
\frac{1}{D} \frac{\delta \ell}{\delta \widehat{\mathbf{v}}} = \rho \widehat{\mathbf{v}} \cdot d\mathbf{r} + \sigma^2 \rho \widehat{\mathbf{w}} d\boldsymbol{\xi} \,. \tag{20}
$$

As expected from the well-known non-acceleration theorem [6, 23], the dynamics of the Euler-Poincaré equations separate (15) gives the dynamics of the fluid and wave components of the momentum one-form (20) **-**

$$\begin{split} & \text{its of the momentum one-form (20)}\\ & \left( \partial\_{l} + \mathcal{L}\_{\widehat{\mathsf{v}}} \right) \Big( \rho \left( \widehat{\mathsf{v}} \cdot d\mathsf{r} \right) \Big) = -dp + \frac{\rho}{2} d \left( \left| \widehat{\mathsf{v}} \right|^{2} \right), \\ & \left( \partial\_{l} + \mathcal{L}\_{\widehat{\mathsf{v}}} \right) \Big( \sigma^{2} \rho \widehat{\mathsf{w}} d\xi \Big) = -\frac{\rho\_{ref}}{Fr^{2}} \xi d\xi + \sigma^{2} \rho \widehat{\mathsf{w}} d\widehat{\mathsf{w}} \,. \end{split} \tag{21}$$

The mass-weighted thermal potential vorticity (TPV) also separates into fluid and wave components *Q* = *QF* + *QW* with following definitions **-**

$$\begin{split} \mathcal{Q} \, d^2 r &= d \Big( \rho \left( \widehat{\mathbf{v}} \cdot d \mathbf{r} + \sigma^2 \widehat{w} d \boldsymbol{\xi} \right) \Big) \\ &= d \rho \wedge \left( \widehat{\mathbf{v}} \cdot d \mathbf{r} + \sigma^2 \widehat{w} d \boldsymbol{\xi} \right) + \rho \Big( \widehat{\mathbf{z}} \cdot \operatorname{curl} \widehat{\mathbf{v}} + \sigma^2 J \left( \widehat{w}, \boldsymbol{\xi} \right) \Big) d^2 r \\ &= \Big( \operatorname{div} \big( \rho \nabla \psi \big) + \sigma^2 J \left( \rho \widehat{w}, \boldsymbol{\xi} \right) \Big) d^2 r \quad \text{when} \quad \widehat{\mathbf{v}} = \nabla^{\perp} \psi \quad \text{for} \quad D = 1, \end{split} \tag{21}$$
  $\text{with} \quad \mathcal{Q}\_F := \operatorname{div}(\rho \nabla \psi) \,, \quad \mathcal{Q}\_W = J \left( \sigma^2 \widetilde{w}, \boldsymbol{\xi} \right) \,. \tag{22}$ 

where buoyancy weighted vertical velocity is defined as *<sup>w</sup>*' := *ρw* . The dynamics of *QF d*2*r* and *QW d*2*r* can be computed from (21) as

$$\begin{split} &\text{For } \mathcal{Q}\_W \, d^2 r \text{ can be computed from (21) as} \\ &\text{For } (\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) (\mathcal{Q}\_F \, d^2 r) = \frac{1}{2} d\rho \wedge d(|\widehat{\mathfrak{v}}|^2) = \frac{1}{2} J(\rho, |\nabla \psi|^2) \, d^2 r, \\ &\text{tr}(\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) (\mathcal{Q}\_W \, d^2 r) = \sigma^2 \frac{1}{2} d\rho \wedge d(\widehat{w}^2) = \frac{1}{2} J\left(\rho, \frac{\sigma^2 \widetilde{w}^2}{\rho^2}\right) d^2 r \,. \end{split} \tag{23}$$

From the two relations in (23), one sees that the buoyancy gradient ∇*ρ* couples the PV dynamics of the waves *(QW )* and currents *(QF )*, each to their corresponding kinetic energy. In the case of constant buoyancy, *dρ* = 0 in (23); so, the PVs of the waves and currents would be separately advected.

The operator div*(ρ*∇*)* is invertible, so long as *ρ* is a differentiable positive function, which can be ensured by requiring that this condition holds initially. Consequently, the stream function *ψ* is related to the other fluid variables by

$$\psi \coloneqq \left(\mathrm{div}\rho\nabla\right)^{-1}\mathcal{Q}\_F \,. \tag{24}$$

The potential vorticity dynamics can then be written in coordinate form as

$$\partial\_t \mathcal{Q}\_F + J(\psi, \mathcal{Q}\_F) = J\left(\rho, \frac{1}{2} |\nabla\_r \psi|^2\right),$$

$$\partial\_t \mathcal{Q}\_W + J(\psi, \mathcal{Q}\_W) = J\left(\rho, \frac{\sigma^2 \tilde{w}^2}{2\rho^2}\right),$$

$$\text{with} \quad \mathcal{Q}\_F := \text{div}(\rho \nabla \psi) \quad \text{and} \quad \mathcal{Q}\_W := J\left(\sigma^2 \tilde{w}, \,\xi\right), \tag{25}$$

$$\partial\_t \rho + J(\psi, \rho) = 0,$$

$$\partial\_t \zeta + J(\psi, \zeta) = \widehat{w} =: \tilde{w}/\rho \,,$$

$$\partial\_t(\sigma^2 \tilde{w}) + J(\psi, \sigma^2 \tilde{w}) = -\frac{\rho\_{ref} \zeta}{Fr^2}.$$

**Theorem 3** *The Legendre transform yields the Hamiltonian formulation of our system of wave-current equations* (25)*, which with <sup>w</sup>*' <sup>=</sup> *ρw may be written in the untangled block-diagonal Poisson form as*

$$\frac{\partial}{\partial t} \begin{bmatrix} \mathcal{Q} \\ \rho \\ \sigma^2 \widetilde{w} \\ \xi \end{bmatrix} = \begin{bmatrix} J(\mathcal{Q}, \cdot) \ J(\rho, \cdot) \ 0 \ 0 \\ J(\rho, \cdot) \ 0 \ 0 \ 0 \\ 0 \\ 0 \end{bmatrix} \begin{bmatrix} \delta h/\delta \mathcal{Q} = \psi \\ \delta h/\delta \rho = \widetilde{w} \\ \delta h/\delta (\sigma^2 \widetilde{w}) = \widetilde{w}/\rho + J(\xi, \psi) \\ \delta h/\delta \xi = -J(\sigma^2 \widetilde{w}, \psi) + \frac{\rho\_{ref} \zeta}{Fr^2} \end{bmatrix}. \tag{26}$$

*The energy Hamiltonian h(Q, ρ , w, ζ ) associated with this system is given by*

$$\begin{split} h(\mathcal{Q}, \rho, \tilde{w}, \boldsymbol{\xi}) &= \int \frac{1}{2} \Big( \mathcal{Q} - J\left(\sigma^{2} \tilde{w}, \boldsymbol{\xi}\right) \Big) (\text{div}\rho \,\nabla)^{-1} \Big( \mathcal{Q} - J\left(\sigma^{2} \tilde{w}, \boldsymbol{\xi}\right) \Big) \\ &+ \left(\frac{\sigma^{2} \tilde{w}^{2}}{2\rho^{2}} + \frac{\rho\_{ref}}{\rho} \frac{\boldsymbol{\xi}^{2}}{2Fr^{2}}\right) \rho \,d^{2}r \,. \end{split} \tag{27}$$

**Theorem 4 (Casimir Functions)** *The Casimir functions, conserved by the relation* {*C,, h*} = 0 *with any Hamiltonian h(M,D) for the block-diagonal Lie-Poisson bracket in Eq.* (26) *are given by*

$$C\_{\Phi, \Psi} := \int \Phi(\rho) + \mathcal{Q}\Psi(\rho) \, d^2 r. \tag{28}$$

*Proof The Casimirs C, for the direct sum of the Lie-Poisson brackets for Q and ρ and canonical Poisson brackets for w*' *and ζ follows by direct verification that the C, are conserved for any differentiable functions, (, ).*

#### *2.4 C***◦***M Equations in the Slowly Varying Envelope (SVE) Approximation*

**The SVE Solutions Apply to Satellite Observations of Sea Surface Waves** From the viewpoint of satellite observations, the vertical motion on the sea surface typically oscillates much more quickly than the rate of change of features in the horizontal motion of the ocean surface currents. In this situation, the standard WKB approximation introduces a solution Ansatz for the slowly varying envelope (SVE) of the rapidly oscillating vertical wave elevation in the standard form [2, 11],

$$\xi(\mathbf{r},t) = \Re\left(a(\mathbf{r},t)\exp\left(\frac{i\theta(\mathbf{r},t)}{\epsilon}\right)\right) \quad \text{with} \quad \epsilon \ll 1 \,. \tag{29}$$

The SVE solution Ansatz (29) comprises the product of a slowly varying complex amplitude *a(r,t)* ∈ **C** multiplied by a rapidly oscillating phase *θ (r, t)/* ∈ **R** with \$ 1 in which the phase factor *θ (r,t)* may also vary slowly as a function of the space and time variables, *(r,t)*.

Following [11], let us substitute the SVE solution Ansatz (29) into Hamilton's principle in (9) and find the condition on the parameter \$ 1 that will allow higher order wave terms to be neglected. For this, one computes

0 = *δSSV E* = *δ b a !SV E(**v,D, ρ*; *a, θ) dt* = *δ b a* - D 1 2 *Dρ*|*v*| <sup>2</sup> <sup>−</sup> *p(D* <sup>−</sup> <sup>1</sup>*)* <sup>+</sup> *<sup>σ</sup>*<sup>2</sup> <sup>2</sup> *Dρ dζ dt* 2 <sup>−</sup> *ρref ρ ζ* 2 2*σ*2*F r*<sup>2</sup> *d*2*r dt* = *δ b a* - D 1 2 *Dρ*|*v*| <sup>2</sup> <sup>−</sup> *p(D* <sup>−</sup> <sup>1</sup>*)* <sup>+</sup> *<sup>σ</sup>*<sup>2</sup> <sup>8</sup> *Dρ da dt* 2 + 2 *dθ dt* ) *<sup>a</sup>*<sup>∗</sup> *da dt* <sup>+</sup> <sup>|</sup>*a*<sup>|</sup> 2 2 *dθ dt* 2 <sup>−</sup> *ρref ρ* 2 *σ*2*F r*<sup>2</sup> *d*2*r dt δ b a* - D 1 2 *Dρ*|*v*| <sup>2</sup> <sup>−</sup> *p(D* <sup>−</sup> <sup>1</sup>*)* <sup>+</sup> *<sup>σ</sup>*2|*a*<sup>|</sup> 2 <sup>8</sup><sup>2</sup> *Dρ ∂t θ* + *v* · ∇*<sup>r</sup> θ* <sup>2</sup> <sup>−</sup> *ρref ρ* 2 *σ*2*F r*<sup>2</sup> *<sup>d</sup>*2*r dt* <sup>+</sup> *<sup>O</sup> σ*<sup>2</sup> *.* (30)

The leading order wave term *O(*−2*)* with \$ 1 in Hamilton's principle will dominate the solution and the remaining wave terms in the second line of Eq. (31) may be neglected, when2

$$
\epsilon \ll 1, \quad \frac{\epsilon^2}{\sigma^2 F r^2} = O(1), \quad \text{and} \quad \sigma^2 F r^2 \ll 1. \tag{31}
$$

According to the estimates in (7) there is a range of physical parameters relevant to satellite observations in which the SVE approximation applies, for *<sup>σ</sup>*2*F r*<sup>2</sup> \$ 1.

To continue the investigation of the SVE description of wave-current interactions on the sea surface, we take variations of the action integral (31) to find the following set of equations, **-**

*δ**<sup>v</sup>* : *δ! δ**<sup>v</sup>* <sup>=</sup> *Dρ v* · *dr* + N *d dθ dt* <sup>⊗</sup> *<sup>d</sup>*2*<sup>r</sup>* with <sup>N</sup> := *<sup>σ</sup>*2|*a*<sup>|</sup> 2 <sup>4</sup><sup>2</sup> *, δ*|*a*| <sup>2</sup> : *δ! δ*|*a*| <sup>2</sup> <sup>=</sup> *<sup>σ</sup>*<sup>2</sup> <sup>8</sup>*F r*<sup>2</sup> *Dρ dθ dt* 2 <sup>−</sup> *ρref ρ* = 0 at *O σ*<sup>2</sup> 2 %⇒ *dθ dt* =: − *<sup>ω</sup>* <sup>+</sup> *v* · *k* = ± √*ρρref ρ* with *ω(r,t)* = −*∂t θ* and *k(r,t)* = ∇*<sup>r</sup> θ , δθ* : *δ! δθ* <sup>=</sup> <sup>0</sup> %⇒ *∂t*<sup>A</sup> <sup>+</sup> div*(*A*<sup>v</sup>)* <sup>=</sup> <sup>0</sup> *,* with <sup>A</sup> := *Dρ*<sup>N</sup> *dθ dt* and <sup>N</sup> := *<sup>σ</sup>*2|*a*<sup>|</sup> 2 <sup>4</sup><sup>2</sup> *, δD* : *δ! δD* <sup>=</sup> *<sup>ρ</sup>* 2 |*v*| <sup>2</sup> <sup>−</sup> *p , δρ* : *δ! δρ* <sup>=</sup> *<sup>D</sup>* 2 |*v*| 2 *, δp* : *<sup>D</sup>* <sup>−</sup> <sup>1</sup> <sup>=</sup> <sup>0</sup> %⇒ div*r**<sup>v</sup>* <sup>=</sup> <sup>0</sup> *,* Hence, *∂t*<sup>A</sup> <sup>+</sup> *v* · ∇*r*A=0 %⇒ *∂t*|*a*| 2 + *v* · ∇*<sup>r</sup>* |*a*| <sup>2</sup>=<sup>0</sup> *.* (32)

<sup>2</sup> The ratio 2*/(σ*2*F r*2*)* <sup>=</sup> *O(*1*)* is required for the rate of change of the phase parameter *θ (r,t)* of the SVE wave solution Ansatz (29) to match the time scale of the density *ρ(r,t)* in Eq.(31).

In the second line of (32) we see that stationarity of the action integral with respect to variations in |*a*| <sup>2</sup> acts as a Lagrange multiplier to impose a constraint which relates the dynamics of the wave phase *θ* to the buoyancy. This constraint relation involves the Doppler-shifted frequency of the waves, as shown in the third line of (32). In combination with conservation of the wave action density and the divergence free condition on the fluid flow velocity *v*, this constraint relation implies in the last line of (32) that the wave magnitude |*a*| <sup>2</sup> is advected by the fluid flow. Because of the oscillatory nature of the solution Ansatz (29), the sign of the wave phase in *dθ/dt* <sup>=</sup> *∂t <sup>θ</sup>* <sup>+</sup> *v* · ∇*<sup>r</sup> θ* in the second line above is immaterial. Hence, hereafter, we will choose the positive root for *dθ/dt* <sup>=</sup> <sup>√</sup>*ρρref /ρ*.

From the conservation of wave action density A in (32) and the definitions of the advected fluid variables, one finds that |*a*| 2, *D* and *ρ* satisfy the following advection relations *<sup>v</sup>)(D d*2*r)* <sup>=</sup> <sup>0</sup> %⇒ *∂tD* <sup>+</sup> div*<sup>r</sup> (D***-**

$$\begin{aligned} \text{Thus} \\ (\partial\_l + \mathcal{L}\_{\widehat{\mathfrak{V}}})(D \, d^2 r) = 0 &\Longrightarrow \partial\_l D + \text{div}\_{\mathbf{r}}(D\widehat{\mathfrak{v}}) = 0 \quad \text{with} \quad D = 1, \\ (\partial\_l + \mathcal{L}\_{\widehat{\mathfrak{V}}}) \rho = 0 &\Longrightarrow \partial\_l \rho + \widehat{\mathfrak{v}} \cdot \nabla\_{\mathbf{r}} \rho = 0, \\ (\partial\_l + \mathcal{L}\_{\widehat{\mathfrak{V}}}) |a|^2 = 0 &\Longrightarrow \partial\_l |a|^2 + \widehat{\mathfrak{v}} \cdot \nabla\_{\mathbf{r}} |a|^2 = 0, \end{aligned} \tag{33}$$

where <sup>L</sup>*<sup>v</sup>* denotes the Lie derivative operation along the horizontal velocity vector field, *<sup>v</sup>*. The Lie derivative notation <sup>L</sup>*<sup>v</sup>* provides coordinate-free brevity in proving the following Kelvin circulation theorem for thermal wave-current theory.

**Theorem 5 (Kelvin-Noether Circulation Theorem)** *The variational equations in* (32) *imply the following Kelvin circulation theorem* **-**

$$\frac{d}{dt}\oint\_{c(\mathfrak{V})} \left(\widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \mathcal{N}d\frac{d\theta}{dt}\right) = -\oint\_{c(\mathfrak{V})} \frac{1}{\rho} dp \,. \tag{34}$$

*Proof* The Euler-Poincaré (EP) theorem [16] in this case yields

$$(\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \frac{\delta \ell}{\delta \widehat{\mathfrak{v}}} = \frac{\delta \ell}{\delta D} \diamond D + \frac{\delta \ell}{\delta \rho} \diamond \rho := D \nabla\_{\mathbf{r}} \frac{\delta \ell}{\delta D} - \frac{\delta \ell}{\delta \rho} \nabla\_{\mathbf{r}} \rho \,. \tag{35}$$

Here, the diamond *(* ' *)* operator is defined for a fluid advected quantity *f* by

$$\left\langle \frac{\delta \ell}{\delta f} \diamond f \; , \; X \right\rangle\_{\mathfrak{X}} = \left\langle \frac{\delta \ell}{\delta f} \; , \; -\mathfrak{E}\_X f \right\rangle\_V \; . \tag{36}$$

In (36), *<sup>X</sup>* <sup>∈</sup> <sup>X</sup>*(***R**2*)* is a (smooth) vector field defined on **<sup>R</sup>**<sup>2</sup> and *<sup>f</sup>* <sup>∈</sup> *<sup>V</sup>* is a vector space of advected quantities. These advected quantities are the scalar function, *ρ*, and the areal density, *D d*2*r*.

Upon using the advection relations for *D* and *ρ* in (33) and the corresponding variational derivatives in (32), the EP equation in (35) simplifies to

$$(\partial\_{l} + \mathcal{L}\_{\mathfrak{P}}) \left( \frac{1}{D\rho} \frac{\delta\ell}{\delta\hat{\mathfrak{v}}} \right) = \frac{1}{\rho} \nabla\_{\mathfrak{r}} \frac{\delta\ell}{\delta D} - \frac{1}{D\rho} \frac{\delta\ell}{\delta\rho} \nabla\_{\mathfrak{r}} \rho \,. \tag{3}$$

$$\text{Equation (32) then yields} \quad (\partial\_{l} + \mathcal{L}\_{\mathfrak{P}}) \left( \widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \mathcal{N}d \frac{d\theta}{dt} \right) = -\rho^{-1} dp + d \left( \frac{1}{2} |\widehat{\mathfrak{v}}|^{2} \right). \tag{37}$$

Inserting the last relation into the following standard relation for the time derivative of a loop integral then completes the proof of Eq.(34) appearing in the statement of the theorem, **--**

$$\begin{aligned} & \text{the mean,} \\ & \frac{d}{dt} \oint\_{c(\mathfrak{F})} \left( \widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \mathcal{N}d \frac{d\theta}{dt} \right) = \oint\_{c(\mathfrak{F})} (\partial\_t + \mathcal{L}\mathfrak{z}) \left( \widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \mathcal{N}d \frac{d\theta}{dt} \right) = \oint\_{c(\mathfrak{F})} -\rho^{-1} dp + d\left(\frac{1}{2}|\mathfrak{F}|^2\right). \end{aligned} \tag{38}$$

Note, however, that Eqs. (32) imply the following combination of advected quantities,

$$(\partial\_t + \mathcal{L}\_{\mathfrak{P}}) \left( \mathcal{N} d \frac{d\theta}{dt} \right) = \frac{\sigma^2}{4 Fr^2} (\partial\_t + \mathcal{L}\_{\mathfrak{P}}) \left( |a|^2 d \sqrt{\frac{\rho\_{ref}}{\rho}} \right) = 0 \,. \tag{39}$$

Consequently, the wave-momentum 1-form <sup>N</sup> *d( dθ dt )* is advected by the fluid flow and the Kelvin circulation theorem in Eq. (38) reduces to the standard circulation theorem for the 2D Euler fluid equations.

*Remark 3 (Separation of Wave and Current Motion in the SVE Approximation)* The decoupling of the Kelvin-Noether circulation theorem into its wave and current components for the SVE approximation is inherited from the un-approximated model. When modifications to the un-approximated model which removes this property are added, one would expect the new SVE approximation to lose the nonacceleration result.

*Remark 4* Equation (39) implies advection of the 1-form |*a*| <sup>2</sup>*dρ*, which in turn implies advection of the Jacobian *J (*|*a*| <sup>2</sup>*,ρ)*. Since the fluid flow is area preserving, div*v* = 0, the following 2-form will also be advected,

$$
\hbar \text{-form will also be adjusted},
$$

$$
\left(\partial\_t + \widehat{\mathfrak{v}} \cdot \nabla\_r\right) \left(d|a|^2 \wedge d\rho\right) = 0. \tag{40}
$$

Thus, the divergence-free flow of *v* preserves the area element *d*|*a*| <sup>2</sup> <sup>∧</sup> *dρ*. This means that if the gradients ∇|*a*| <sup>2</sup> and <sup>∇</sup>*<sup>ρ</sup>* are not aligned initially, then they will remain so. It also means that equilibrium solutions of (40) will be symplectic manifolds [14].

After deriving these equations, one may finally evaluate the constraint *D* = 1 imposed by the variation in pressure *p* to obtain further simplifications.

#### *2.5 Thermal Potential Vorticity Dynamics with SVE on a Free Surface*

The momentum map arising from the variations of the action in (32) is given by **-**

$$\frac{1}{D}\frac{\delta\ell}{\delta\widehat{\mathfrak{v}}} = \rho \left(\widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \mathcal{N}d\frac{d\theta}{dt}\right) \quad \text{with} \quad \mathcal{N} := \frac{\sigma^2 N^2 |a|^2}{4} =: \Gamma |a|^2 \quad \text{and} \quad \frac{d\theta}{dt} = \sqrt{\frac{\rho\_{ref}}{\rho}},$$

$$\text{so} \quad \frac{1}{D}\frac{\delta\ell}{\delta\widehat{\mathfrak{v}}} = \rho \left(\widehat{\mathfrak{v}} \cdot d\mathfrak{r} + \Gamma |a|^2 d(\sqrt{\rho\rho\_{ref}}/\rho)\right). \tag{41}$$

According to the Euler-Poincaré equation (37), the dynamics of the fluid and wave components of the 1-form in (41) *separates* into the following equations, **-**

$$\begin{aligned} \text{1-form in (41) separates into the following equations,}\\ (\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \Big( \rho \left( \widehat{\mathfrak{v}} \cdot dr \right) \Big) &= -dp + \frac{\rho}{2} d \big( |\widehat{\mathfrak{v}}|^2 \Big),\\ (\partial\_t + \mathcal{L}\_{\widehat{\mathfrak{v}}}) \Big( |a|^2 d \sqrt{\rho \rho\_{ref}} \Big) &= 0 \end{aligned} \tag{42}$$

This means that the mass-weighted thermal potential vorticity (TPV) dynamics also separates into the following fluid and wave components, *Q* = *QF* + *QW* , given by **-**

$$\begin{split} \mathcal{Q} \, d^2 r &:= d \left( \rho \left( \widehat{\boldsymbol{v}} \, d \boldsymbol{r} + \Gamma |\boldsymbol{a}|^2 d \sqrt{\frac{\rho \rho \boldsymbol{e} f}{\rho}} \right) \right) \\ &= \left( \operatorname{div} (\rho \nabla \psi) - \Gamma J \left( |\boldsymbol{a}|^2, \sqrt{\rho \rho\_{ref}} \right) \right) d^2 r \quad \text{when} \quad \widehat{\boldsymbol{v}} = \nabla^{\perp} \psi \quad \text{for} \quad D = 1, 2 \\ &= \mathcal{Q}\_F \, d^2 r + \mathcal{Q}\_W \, d^2 r, \\ &\quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \quad \dots \end{split}$$

with *QF* := div*(ρ*∇*ψ)* and *QW* := *%J ρρref ,* <sup>|</sup>*a*<sup>|</sup> 2 *.*

Then, again, the differentials of the separate equations in (42) yield the 'nonacceleration' result,

$$\begin{aligned} & \text{ion' result}, \\ & (\partial\_{\mathfrak{t}} + \mathcal{L}\_{\mathfrak{F}}) (\mathcal{Q}\_F \, d^2 r) = \frac{1}{2} d\rho \wedge d |\widehat{\mathfrak{v}}|^2 = \frac{1}{2} J(\rho, |\nabla \psi|^2) \, d^2 r, \\ & (\partial\_{\mathfrak{t}} + \mathcal{L}\_{\mathfrak{F}}) (\mathcal{Q}\_W \, d^2 r) = 0. \end{aligned} \tag{44}$$

Equivalently, in coordinates one has

(43)

$$\begin{aligned} \boldsymbol{\nu}, \boldsymbol{\nu}, \text{homew } \mathbf{a}, \\\\ \partial\_t \boldsymbol{Q}\_F + \widehat{\mathbf{v}} \cdot \nabla \boldsymbol{Q}\_F &= \frac{1}{2} J \left( \boldsymbol{\rho}, |\nabla \boldsymbol{\psi}|^2 \right), \\\\ \text{with} \quad \boldsymbol{Q}\_F := \text{div} (\boldsymbol{\rho} \nabla \boldsymbol{\psi}) \quad \text{and} \quad \boldsymbol{Q}\_W := \Gamma J \left( \sqrt{\rho \rho\_{ref}}, |\boldsymbol{a}|^2 \right), \\\\ \partial\_t \boldsymbol{\rho} + \widehat{\mathbf{v}} \cdot \nabla\_r \boldsymbol{\rho} &= 0 \quad \text{and} \quad \Gamma = \frac{\sigma^2}{4 Fr^2} = O(1), \\\\ \partial\_t |\boldsymbol{a}|^2 + \widehat{\mathbf{v}} \cdot \nabla\_r |\boldsymbol{a}|^2 &= 0, \\\\ \partial\_t \boldsymbol{\theta} + \widehat{\mathbf{v}} \cdot \nabla\_r \boldsymbol{\theta} &= \frac{\sqrt{\rho \rho\_{ref}}}{\rho}. \end{aligned} \tag{45}$$

The operator *(*div*ρ*∇*)* is invertible, so long as *ρ* is a differentiable positive function, which can be ensured by requiring that this condition holds initially, since *ρ* is advected. Consequently, the stream function *ψ* is related to the other fluid variables by

$$
\psi \coloneqq \left(\mathrm{div}\rho\nabla\right)^{-1}\mathcal{Q}\_F \,. \tag{46}
$$

The dynamics of the equation set (45) explains why the various physical components of the flow coordinate their movements, as seen in satellite observations in Fig. 2. In particular, the motion of buoyancy *ρ* and squared wave amplitude |*a*| <sup>2</sup> are coordinated with each other through the advection of the momentum 1-form |*a*| <sup>2</sup>*dρ* and the area 2-form *d*|*a*| <sup>2</sup> <sup>∧</sup> *dρ*. Likewise, the motion of the fluid potential vorticity *QF* and the mass density *ρ* are coordinated with each other through the massweighted definition of the stream function in (46). These considerations emphasise again the importance of horizontal buoyancy gradients in sea surface dynamics.

#### **3 Numerical Implementation**

Our implementation of the C◦M equations (25) and the C◦M equations in the SVE approximation (45) used the finite element method (FEM) for the spatial variables. The FEM algorithm we used is based on the algorithm formulated in [15] and is implemented using the Firedrake<sup>3</sup> software. In particular, for (25) we approximated the fluid potential vorticity *QF* , buoyancy *ρ*, wave elevation *ζ* and bouyancy weighted wave vertical velocity *w*˜ using a first order discrete Galerkin finite element space. Similarly, for (45), we approximated *QF* , *ρ*, square of the wave amplitude |*a*| <sup>2</sup> and wave phase *θ* using a first order discrete Galerkin finite element space. The stream function *ψ* for both models was approximated by using

<sup>3</sup> https://firedrakeproject.org/index.html.

a first order continuous Galerkin finite element space. For the time integration, we used the third order strong stability preserving Runge Kutta method [12].

Figures 3 and 4 present snapshots of high resolution runs of the C◦M equations and the C◦M equations in the SVE approximation. These simulations were run with the following parameters. The domain is [0*,* 1] <sup>2</sup> at a resolution of 5122. The boundary conditions are periodic in the *x* direction, and homogeneous Dirichlet for *ψ* in the *y* direction. To see the effects of the waves on the currents, the procedure was divided into two stages for both set of equations. The first stage was performed without wave activity for *Tspin* = 100 time units starting from the following initial conditions

$$\mathcal{Q}\_F(\mathbf{x}, \mathbf{y}, 0) = \sin(8\pi x)\sin(8\pi \mathbf{y}) + 0.4\cos(6\pi x)\cos(6\pi \mathbf{y}) + 0.3\cos(10\pi x)\cos(4\pi \mathbf{y}) + 0.1\sin(10\pi \mathbf{y})\cos(4\pi \mathbf{y})$$

$$0.02\sin(2\pi \mathbf{y}) + 0.02\sin(2\pi \mathbf{x})$$

$$\rho(\mathbf{x}, \mathbf{y}, 0) = 1 + 0.2\sin(2\pi \mathbf{x})\sin(2\pi \mathbf{y}) \quad \text{and} \quad \rho\_{ref} = \mathbf{l} \,. \tag{47}$$

The purpose of the first stage was to allow the system to spin up to a statistically steady state without any wave activity. The PV and buoyancy variables at the end of the initial spin-up period are denoted as *Qspin(x, y)* = *QF (x, y, Tspin)* and *ρspin(x, y)* = *ρ(x, y, Tspin)*. Figures of these variables are shown in Fig. 5. In the second stage, the full simulations including the wave variables were run with the initial conditions for the flow variables being the state achieved at the end of the first stage. To start the second stage for (25), wave variables were introduced with the following initial conditions

$$\xi(\mathbf{x}, \mathbf{y}, 0) = \sin(8\pi x)\sin(8\pi \mathbf{y}) + 0.4\cos(6\pi x)\cos(6\pi \mathbf{y}) + 0.3\cos(10\pi x)\cos(4\pi \mathbf{y}) + 0.1\sin(10\pi x)\cos(4\pi \mathbf{y}) + 0.1\sin(10\pi x)\cos(6\pi \mathbf{y})$$

$$0.02\sin(2\pi \mathbf{y}) + 0.02\sin(2\pi x)$$

$$\ddot{w}(\mathbf{x}, \mathbf{y}, 0) = 0, \quad \mathcal{Q}\_F(\mathbf{x}, \mathbf{y}, 0) = \mathcal{Q}\_{spin}(\mathbf{x}, \mathbf{y}), \quad \rho(\mathbf{x}, \mathbf{y}, 0) = \rho\_{spin}(\mathbf{x}, \mathbf{y}),$$

$$\sigma^2 F r^2 = 10^{-2}. \tag{48}$$

To start the second stage for (45), wave variables were introduced with the following initial conditions

$$|a|^2(\mathbf{x}, \mathbf{y}, \mathbf{0}) = \left(\sin(8\pi x)\sin(8\pi \mathbf{y}) + 0.4\cos(6\pi x)\cos(6\pi \mathbf{y}) + 0.3\cos(10\pi x)\cos(4\pi \mathbf{y}) + 0.1\right) \tag{4.02}$$

$$0.02\sin(2\pi \mathbf{y}) + 0.02\sin(2\pi x)\Big|^2,$$

$$\theta(\mathbf{x}, \mathbf{y}, \mathbf{0}) = \mathbf{0}, \quad \mathcal{Q}\_F(\mathbf{x}, \mathbf{y}, \mathbf{0}) = \mathcal{Q}\_{spin}(\mathbf{x}, \mathbf{y}), \quad \rho(\mathbf{x}, \mathbf{y}, \mathbf{0}) = \rho\_{spin}(\mathbf{x}, \mathbf{y}). \tag{49}$$

**Fig. 5** These figures show the results of the first stage of the simulation in which only fluid motion is present and the wave degrees of freedom are absent. The panels show fluid potential vorticity *QF* (left) and buoyancy *ρ* (right). The fluid state obtained from the first stage was used as the initial condition for the second stage simulations in which wave variables were included. These distributions of fluid properties show strong spatial coherence. The coordination of wave and fluid properties that emerges in the second stage of the simulations shown in Figs. 3 and 4 arises from the interaction between the wave and current components of the flow which is mediated by the buoyancy gradient

*Remark 5* Importantly, the wave phase *θ* in the second stage was set initially to zero. Thereafter, the wave phase *θ* increased linearly in time in proportion to the advected quantity √*ρρref /ρ* following each flow line, as implied by the last equation in (45).

#### **4 Conclusion and Outlook**

This paper models the effects of thermal fronts on the dynamics of the ocean's waves and currents. It introduces and simulates two models of thermal wave-current dynamics on a free surface. The original C◦M model is derived from Hamilton's principle via the composition of two maps which represent the horizontal and vertical motion respectively. The second, a slowly varying envelope (SVE) model, is introduced via the standard WKB approximation which takes advantage of large separation of the space-time scales between the slow horizontal currents and fast vertical oscillations. In particular, the second model introduces the WKB solution Ansatz into Hamilton's principle, whereupon the time integral averages over the phases of the rapid oscillations that are out of resonance with the slowly varying envelope. Model runs of both models are presented in which the buoyancy mediates the dynamics of the currents and waves, as seen in Figs. 3 and 4. These simulations also validate the use of the WKB approximation for two reasons. First, the resolved small scale wave features of the original C◦M model lie primarily within the envelope defined by the SVE approximate model. This means that the dynamics of the spatial features of the SVE approximate model are consistent with those of the original C◦M model, although the resolved space and time scales differ. Secondly, requiring that 2*/(F r*2*σ*2*)* <sup>=</sup> *O(*1*)* ensures that the time scale for the wave envelope dynamics matches that of the fluid motion.

Nonetheless, the two models introduced here merit further study in several directions. For example, it remains to: (1) quantify the correlations observed visually; (2) determine their rate of formation; and (3) parameterise the model for comparison and analysis of the satellite data on which their derivations were based. Furthermore, the models discussed here involve only variables that are evaluated on the free surface and therefore they neglect bathymetry. A scientific challenge persists in understanding regions of the ocean where bathymetry has profound effects on the observable surface dynamics, such as in the Lofoten vortex [21]. This is a multiscale issue that might be addressed by including mesoscale modulations of the sub-mesoscale models derived here. One candidate for providing the mesoscale modulations would the thermal quasi-geostrophic (TQG) model in which bathymetry has recently been included [15].

The currents are modelled here by the two dimensional incompressible Euler equations, as seen in Eqs.(2) and (3). Incompressibility is a reasonable assumption in some regions of the ocean, for example when the quasigeostrophic approximation is valid. There are regions in the upper ocean where other equations are more suitable for modelling currents, and the development and investigation of such two dimensional models is an open problem which warrants further consideration.

As mentioned in Remark 1, the wave component of the model presented here does not create circulation in the currents. The instabilities present in satellite simulations indicate that additional modelling is needed to fully capture this effect. Future work will investigate approaches for modelling these instabilities.

Many other questions remain about wave-current interaction. The full extent of submesoscale ocean dynamics is by no means adequately described by existing models. For example, we have little understanding of the formation and dynamics of various sea-surface phenomena, including the so-called 'spirals on the sea' [18]. Other questions are emerging because the ocean has absorbed in excess of 90% of the heat present in the earth system as a result of human activity during the post-industrial era [19]. The absorption of heat from the warming atmosphere is ongoing and it is forecast to become more dramatic. This absorption has resulted in 'marine heat waves', which are predicted to increase in frequency and severity. These changes to the upper ocean, where most of this heat is stored, could have a profound effect on the dynamical landscape of our oceans. These effects may, in turn, influence our weather and climate systems. Over the millennia, the ocean has approached statistical equilibrium under its current forcing conditions. Using modelling terminology, one says the ocean is well 'spun-up'. However, the continued warming of the ocean is likely to influence the number and intensity of thermal fronts. One hopes that mathematical models will provide a useful framework for estimating some of the potential impacts of these thermal fronts on atmospheric effects, as well.

**Acknowledgments** We are grateful to our friends and colleagues who have generously offered their time, thoughts, and encouragement in the course of this work during the time of COVID-19. Thanks to A. Arnold, B. Chapron, D. Crisan, E. Luesink, A. Mashayek, and J. C. McWilliams for their thoughtful comments and discussions. Particular thanks to B. Chapron, for extensive discussions of satellite oceanography and for providing the satellite data in Figs. 1 and 2. We also thank B. Fox-Kemper for constructive discussions of modelling approaches in physical oceanography. These discussions helped us clarify the distinction between the present C◦M modelling approach and the classical balance equation approach. The authors are grateful for partial support, as follows. DH for European Research Council (ERC) Synergy grant STUOD - DLV-856408; RH for the EPSRC scholarship (Grant No. EP/R513052/1); and OS for the EPSRC Centre for Doctoral Training in the Mathematics of Planet Earth (Grant No. EP/L016613/1).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Variational Stochastic Parameterisations and Their Applications to Primitive Equation Models**

**Ruiao Hu and Stuart Patching**

**Abstract** We present a numerical investigation into the stochastic parameterisations of the Primitive Equations (PE) using the Stochastic Advection by Lie Transport (SALT) and Stochastic Forcing by Lie Transport (SFLT) frameworks. These frameworks were chosen due to their structure-preserving introduction of stochasticity, which decomposes the transport velocity and fluid momentum into their drift and stochastic parts, respectively. In this paper, we develop a new calibration methodology to implement the momentum decomposition of SFLT and compare with the Lagrangian path methodology implemented for SALT. The resulting stochastic Primitive Equations are then integrated numerically using a modification of the FESOM2 code. For certain choices of the stochastic parameters, we show that SALT causes an increase in the eddy kinetic energy field and an improvement in the spatial spectrum. SFLT also shows improvements in these areas, though to a lesser extent. SALT does, however, have the drawback of an excessive downwards diffusion of temperature.

**Keywords** Primitive equations · Geometric mechanics · FESOM2 · Stochastic parameterisation

#### **1 Introduction**

Uncertainty can be present in ocean models due to a number of factors including, but not limited to: small-scale processes not resolved by the grid; observation error; model error; numerical error and unrealistic viscosities imposed to ensure numerical stability. Several stochastic parameterisation techniques [PZ14, Ber05, Mem14, Hol15, HH21] have been proposed recently as ways of representing uncertainty in ocean models. Because these parameterisations are probabilistic, it is possible to

R. Hu · S. Patching (-)

Department of Mathematics, Imperial College London, London, UK e-mail: ruiao.hu15@imperial.ac.uk; s.patching17@imperial.ac.uk

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_9

generate ensemble forecasts [CCH+19, CCH+20, Cot20, UJPD21] with associated means and variances, which can then be applied to data assimilation. This work will focus on two frameworks which introduce stochasticity in a way that preserves certain fundamental and desirable properties of fluid flows. These frameworks are: Stochastic Advection by Lie Transport (SALT) [Hol15] and Stochastic Forcing by Lie Transport (SFLT) [HH21]. Both SALT and SFLT are derived from variational principles, from which we may observe the geometric structure of the fluid equations and the conservation laws which are inherited.

The key assumption of SALT is the decomposition of transport velocity into a slow mean part and a fast, rapidly fluctuating part around the mean. In the limit of high fluctuation frequency, one can use homogenisation theory to transform the rapidly-fluctuating component to a sum of stochastic vector fields [CGH17]. Thus, the modification from the deterministic flow is the addition of stochastic vector fields to the transport velocity. This stochastic modification has been shown [Hol15] to preserve the Kelvin circulation theorem and the advection equation for potential vorticity. In the case where buoyancy obeys an advection relation, the potential vorticity is conserved along particle paths. However, SALT violates energy conservation since stochastic Hamiltonians are introduced into the variational principle. The application of the SALT in quasi-geostrophic (QG) models and the 2D Euler equations has been investigated before in [CCH+20, CCH+19, Cot20]. However, these models are too simplistic to be used in operational ocean simulations, and the majority of ocean codes (e.g. MOM5 [GBB+00], ICON [Kor17], MITgcm [MAH+97], FESOM2 [DSWJ16]) solve the Primitive Equations (PE). For this reason, if SALT is to be employed for use in practical applications, it must be adapted for use in PE. This introduces additional features to the model as compared the QG or 2D Euler: in PE there are advected quantities such as temperature and salinity, which in the SALT framework are advected by the stochastic velocity. There is, moreover, a subtlety in the pressure arising from the imposition of a semimartingale Lagrange multiplier in the incompressibility condition of the variational principle [SC21].

An alternative stochasatic parameterisation is the more recent SFLT framework [HH21]. Derived via a Lagrange-d'Alembert principle, SFLT allows the addition of arbitrary stochastic forcings to the evolution equations of the momentum and of the advected quantities. This modification differs from SALT, as stochasticity is added in the variational principle *after* taking variations of the Hamiltonian for the deterministic system . By considering the Lie-Poisson bracket of the system, we choose the forcing to be of a particular form that preserves, on every realisation of the noise, the original (deterministic) Hamiltonian. For PE, the Hamiltonian is given in Eq. (2). However, the addition of energy preserving forces will modify the Kelvin circulation theorem. In the current work, we will consider the case where the stochastic forcing is in the energy conserving form and applied to the momentum equation. As in the SALT case, stochastic pressure terms will appear in the momentum equation due to the imposition of semi-martingale Lagrange multiplier in the incompressibility constraint. Prior to the present work, SFLT has not been implemented into numerical models.

The rest of the paper is structured as follows. In Sect. 2, we derive PE with both SALT and SFLT from a variational principle and we show the conservation properties from the resulting equations. In Sect. 3, we consider calibration procedures to calculate the stochastic parameters of SALT and SFLT. In particular, we use the Lagrangian paths method of [CCH+20] but also consider a simpler technique, that of Eulerian differences, which we propose is more appropriate for use in SFLT. In Sect. 4, we present numerical results of applying SALT and SFLT to FESOM2 [DSWJ16] (see Sect. 5), demonstrating the different effects of these stochastic frameworks and the sensitivity to the choice of parameters.

#### **2 Stochastic Primitive Equations**

#### *2.1 Variational Principles for Stochastic Primitive Equations*

Variational principles may be used to derive systems of fluid equations [HMR98, HSS09] which obey conservation laws such as the Kelvin circulation theorem. To derive the Primitive Equations from a variational principle, the appropriate Lagrangian is [HSS09]:

$$l(\mathbf{u}, D, T, S) = \int \left(\frac{1}{2} \left| \mathbf{u} \right|^2 + \mathbf{u} \cdot \mathbf{R} - V(T, S, z) \right) D d^3 \mathbf{x} \,, \tag{1}$$

where **u** = *(u, v)* is the horizontal velocity vector field, **R** is the Coriolis potential, which satisfies curl **R** = *f (y)***z**ˆ with *f (y)* = 2*Ω* cos *y* and *Ω* = 2*π/*day is the rotational frequency of the earth. *T* and *S* are the temperature and salinity respectively; these are tracers advected by the fluid. *D* is the Jacobian of the flow map *gt* that maps a fluid particle at initial position **x**<sup>0</sup> to its position **x***<sup>t</sup>* = *gt***x**<sup>0</sup> at time *t*. *V* is the potential energy, which has explicit dependence on *T* and *S*, as well as on the vertical coordinate *z*. The three-dimensional velocity shall be denoted **v** = *(***u***, w)*.

In order to obtain the correct hydrostatic balance condition the potential energy should obey *∂V ∂z (T , S, z)* = *g(*1+*b)* where the partial derivative is taken with respect to *z* at constant *T,S*. *b* is the buoyancy, given by the equation of state *b* = *b(T , S, z)*.

It is convenient here to use the Clebsch version of the variational principle [CH09] in Hamiltonian form. The Hamiltonian is given by Legendre transformation as *h(***m***h,D, T , S)* := **u***, δl δ***u** <sup>−</sup> *l(***u***,D, T , S)* where **<sup>m</sup>***<sup>h</sup>* := *δl <sup>δ</sup>***<sup>u</sup>** = *D (***u** + **R***)* is the horizontal momentum. We have also defined the inner product *p, q* <sup>=</sup> ! *<sup>p</sup>* · *qd*3*x*. We shall use the same angle-bracket notation for all such pairings, when *p* and *q* are dual variables, e.g. vector field and 1-form density; or a scalar and a density. The Hamiltonian can be written explicitly as:

$$h(\mathbf{m}^h, D, T, S) = \int \left(\frac{1}{2} \left| \frac{\mathbf{m}^h}{D} - \mathbf{R} \right|^2 + V(T, S, z) \right) D d^3 \mathbf{x} \,. \tag{2}$$

In the Clebsch variational principle when SALT or SFLT are present, the (3 dimensional) transport velocity d*χ* is defined to be a stochastic process. The form of d*χ* is defined using Lagrange-multiplier constraints to impose the transport equations d + Ld*<sup>χ</sup> a* = 0, where *a* ∈ {*D, T , S*} [see SW68]. Here we remark that for clarity, we denote by an italic *d* the spatial differential and a straight red d for the stochastic time-increment. Ld*<sup>χ</sup>* denotes the Lie derivative, which is a differential operator with a form that depends on the object on which it acts. We remark here that there is a slight abuse of notation and we shall write *D* as a shorthand for *Dd*3*x* so that this is a density 3-form and the Lie derivative is given by Ld*χD* = ∇· *(*d*χ D)*. *T* and *S* are scalars, so we have Ld*<sup>χ</sup> T* := d*χ* · ∇*T* and similarly for *S*. In order to obtain the incompressibility of the transport velocity d*χ*, we include an additional constraint to set *D* = 1 where the Lagrange multiplier will be interpreted as the pressure. Since the Hamiltonian *h* only depends on the horizontal momentum **m***h*, we need to include an extra constraint so that the vertical component of the momentum is set to zero; this will give us hydrostatic balance.

The defining feature of SALT is that the transport velocity is the sum of the drift velocity and a number of stochastic corrections to the drift:

$$\mathrm{d}\boldsymbol{\chi}(\mathbf{x},t) := \mathbf{v}(\mathbf{x},t)\mathrm{d}t + \sum\_{l} \boldsymbol{\xi}\_{l}(\mathbf{x},t) \diamond \mathrm{d}W^{l}\_{l} \,. \tag{3}$$

where *ξ i(***x***,t)* are arbitrary vector fields. We remark here that Eq. (3) is a stochastic process at fixed Eulerian points **x** and we do not solve for this process explicitly. d*χ* is distinct from the particle trajectories **x***<sup>t</sup>* , which evolve in time according to <sup>d</sup>**x***<sup>t</sup>* <sup>=</sup> **<sup>v</sup>***(***x***t,t)*d*<sup>t</sup>* <sup>+</sup> *<sup>i</sup> <sup>ξ</sup> i(***x***t,t)*◦d*W<sup>i</sup> <sup>t</sup>* and will be used during calibration procedures in Sect. 3. We can impose the form of the transport velocity specified in Eq. (3) by including in the action some additional stochastic Hamiltonians *<sup>i</sup> hi(***m***h)* ◦ <sup>d</sup>*W<sup>i</sup> t* where the horizontal component of the parameters is given by *ξ <sup>h</sup> <sup>i</sup> (***x***,t)* <sup>=</sup> *δhi <sup>δ</sup>***m***<sup>h</sup>* . The three-dimensional momentum is denoted **<sup>m</sup>** <sup>=</sup> *(***m***h, m*3*)*. We note that in principle *ξ <sup>i</sup>* may depend on time; however, we shall henceforth assume for simplicity that *<sup>ξ</sup> <sup>i</sup>* <sup>=</sup> *<sup>ξ</sup> i(***x***)* is a function of space only. When *hi* are independent of **<sup>m</sup>***h*, we have the relation d*χ(***x***,t)* := **v***(***x***,t)*d*t*, so that d*χ* reduces to the original deterministic transport.

SFLT is included [HH21] via a Lagrange-d'Alembert term *δ*d*χ,* **F** added to the variation of the action *δS*. Since this is added after variations of the action are taken, the forcing **F** can in principle be arbitrary. Overall, the variational principle takes the following form:

Variational Stochastic Parameterisations 139

$$\begin{split} 0 = \delta \mathcal{S} &= \delta \int \langle \mathrm{d}\chi, \mathbf{m} \rangle - h(\mathbf{m}^{(h)}, D, T, S) \mathrm{d}t - \langle \mathrm{d}\xi, m\_{3} \rangle - \langle \mathrm{d}P, D - 1 \rangle \\ &+ \langle \alpha, \left( \mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathbf{x}} \right) D \rangle + \langle \beta, \left( \mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathbf{x}} \right) T \rangle + \langle \gamma, \left( \mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathbf{x}} \right) S \rangle \\ &- \underbrace{\sum\_{l=1}^{N\_{\overline{l}}} h\_{l} \left( \mathbf{m}^{(h)}, \mathfrak{g}\_{l}^{(h)} \right) \diamond \mathrm{d}W\_{l}^{l}}\_{\mathrm{SALT}} - \underbrace{\int \langle \delta \mathrm{d}\chi, \mathbf{F} \rangle}\_{\mathrm{SHLT}} \ . \end{split} \tag{4}$$

The first two lines of Eq. (4) are what would be included in the unmodified variational principle. d*ζ* is a Lagrange multiplier, enforcing *m*<sup>3</sup> = 0 and after taking variations can be interpreted as the vertical component of the stochastic transport velocity. Indeed, we may expand <sup>d</sup>*<sup>ζ</sup>* <sup>=</sup> *<sup>w</sup>*d*<sup>t</sup>* <sup>+</sup> *<sup>i</sup> <sup>ξ</sup> (z) <sup>i</sup>* ◦ <sup>d</sup>*W<sup>i</sup> <sup>t</sup>* ; note that here d*ζ* is varied and so the third component of *ξ <sup>i</sup>* is treated as a variable in the action, whereas the horizontal components are treated as fixed parameters. The final term on the top line enforces incompressibility, and the Lagrange multiplier d*P* must be stochastic since a semi-martingale Lagrange multiplier is required to enforce a condition on the semi-martingale *D* [see SC21]. On the second line the quantities *α, β, γ* are Lagrange multipliers enforcing the fact that *D, T , S* are advected quantities. The final line contains the modifications required to include SALT or SFLT; we shall not in practice use both SALT and SFLT together, but for compactness of the presentation we include them together here. The first modification, giving SALT, consists of a sum of *Nξ* Hamiltonians multiplied by Stratonovich noise. The second, additional term is a Lagrange-d'Alembert term which introduces a shift **F** in the momentum. We remark that by including further Lagrange-d'Alembert terms such as *δα,* d*FD* or *δβ,* d*FT*  etc. we may add arbitrary forcings to the right-hand side of the equations for the advected tracers. However, we do not consider this here.

The equations resulting from the variational principle *δ*S = 0 are:

$$\delta \mathbf{m}^h : \qquad \mathrm{d}\boldsymbol{\chi}^{(h)} = \frac{\delta h}{\delta \mathbf{m}^h} \mathrm{d}t + \sum\_l \frac{\delta h\_l}{\delta \mathbf{m}^h} \circ \mathrm{d}W^l\_l \; ; \tag{5a}$$

$$
\delta m\_{\mathcal{S}} : \qquad \mathrm{d}\chi^{(\varepsilon)} = \mathrm{d}\xi \; ; \tag{5b}
$$

$$\delta \mathbf{d} \mathbf{y} : \qquad \mathbf{m} = \alpha \diamond D + \beta \diamond T + \mathbf{y} \diamond S + \mathbf{F} ; \tag{5c}$$

$$\delta\alpha \; ; \qquad \left(\mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathcal{X}}\right) D = 0 ; \qquad \qquad \qquad \qquad \qquad \qquad (\mathrm{5d})$$

$$\delta\beta \; ; \qquad \left(\mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathcal{X}}\right)T = 0 ; \qquad \qquad \qquad \qquad (5\mathrm{e})$$

$$\delta \mathcal{Y} : \qquad \left(\mathsf{d} + \mathcal{L}\_{\mathsf{d}\mathsf{X}}\right) \mathcal{S} = 0 ; \tag{\mathsf{Sf}}$$

$$
\delta D: \qquad \left(\mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathcal{K}}\right) \alpha = -\left(\mathrm{d}P + \frac{\delta h}{\delta D}\mathrm{d}t\right); \tag{5g}
$$

$$
\delta T \, ; \qquad \left(\mathrm{d} + \mathcal{L}\_{\mathrm{d}\chi}\right)\beta = -\frac{\delta h}{\delta T}\mathrm{d}t \, ; \tag{5h}
$$

$$\delta S: \qquad \left(\mathrm{d} + \mathcal{L}\_{\mathrm{d}\mathcal{X}}\right)\chi = -\frac{\delta h}{\delta S}\mathrm{d}t;\tag{5i}$$

$$\delta \mathbf{d} P \; ; \qquad D = \mathbf{l} \; ; \tag{5j}$$

$$
\delta \mathbf{d} \boldsymbol{\xi} \; ; \qquad m\_3 = 0 ; \tag{5k}
$$

The diamond in Eq. (5c) is a binary operator acting on two variables that are dual with respect to the inner product ·*,* · (e.g. or scalar and density) and giving a 1 form density. Explicitly, for two dual variables *p, q* and an arbitrary vector field *X* : the diamond is defined by the relation *p* ' *q,X* = − *p,*L*Xq*. We can compute these explicitly as follows:

$$
\frac{\delta h}{\delta D} \diamond D = D \nabla \frac{\delta h}{\delta D}, \qquad \frac{\delta h}{\delta T} \diamond T = -\frac{\delta h}{\delta T} \nabla T \,, \qquad \frac{\delta h}{\delta S} \diamond S = -\frac{\delta h}{\delta S} \nabla S \,. \tag{6}
$$

We note that the form of d*χ* as given in Eq. (3) is not an input to the variational principle, but a consequence of it. Indeed, we obtain Eq. (3) by defining **v** := *δh <sup>δ</sup>***m***<sup>h</sup> , w* and *<sup>ξ</sup> <sup>i</sup>* := *δh <sup>δ</sup>***m***<sup>h</sup> , ξ (z) i* . The horizontal velocity is therefore **<sup>u</sup>** <sup>=</sup> *δh <sup>δ</sup>***m***<sup>h</sup>* = **m***<sup>h</sup> <sup>D</sup>* − **R**. The fact that *D* = 1, combined with Eq. (5d) gives the incompressibility condition ∇ · <sup>d</sup>*<sup>χ</sup>* = ∇*(h)* · <sup>d</sup>*χ(h)* <sup>+</sup> *<sup>∂</sup> ∂z* d*ζ* = 0. By Doob-Meyer decomposition [Doo53, Mey62, Mey63], we can split the incompressibility condition into its drift part and stochastic oscillations. Thus we are able to compute *w, ξ (z) <sup>i</sup>* in terms of **u** and *ξ (h)* respectively:

$$
\nabla^{(h)} \cdot \mathbf{u} + \frac{\partial w}{\partial z} = 0 \,, \qquad \nabla^{(h)} \cdot \boldsymbol{\xi}\_{l}^{(h)} + \frac{\partial \boldsymbol{\xi}\_{l}^{(z)}}{\partial z} = 0 \,. \tag{7}
$$

Boundary conditions at *z* = 0 allow us to integrate Eq. (7) in the vertical direction. To obtain the momentum equation we apply d + Ld*<sup>χ</sup>* to both sides of Eq. (5c) and use the fact that the Lie derivative obeys a Leibniz rule with respect to the diamond operator. After some re-arranging, we obtain:

$$\left(\mathbf{d} + \mathcal{L}\_{\mathrm{d}\mathbf{X}}\right) \left(\frac{\mathbf{m}^h - \mathbf{F}}{D} \cdot d\mathbf{x}\right) = -d\left(\left(\frac{\delta h}{\delta D} - V\right)\mathbf{d}t + \mathrm{d}P\right) + \frac{\partial V}{\partial z} dz \mathrm{d}t \,. \tag{8}$$

We shall show in Sect. 2.2 that the SFLT terms will conserve energy if we require that the momentum shift **F** takes a particular form, which is that it satisfies d + Ld*<sup>χ</sup>* **F** = L**v**d*Φ*, for some stochastic process d*Φ*. In this work, we shall assume further that <sup>d</sup>*<sup>Φ</sup>* has the form <sup>d</sup>*<sup>Φ</sup>* <sup>=</sup> *<sup>I</sup> <sup>φ</sup><sup>I</sup>* ◦ <sup>d</sup>*B<sup>I</sup> <sup>t</sup>* for some spatially dependent parameters *φ<sup>I</sup>* and with *B<sup>I</sup> <sup>t</sup>* being a set of independent Brownian motions. Because the momentum **<sup>m</sup>** <sup>=</sup> *(***m***h,* <sup>0</sup>*)* has only horizontal components, we shall assume that *φ<sup>I</sup>* also have only horizontal components. Moreover, we can expand the pressure in terms of its drift component and Brownian increments: d*P* = *<sup>p</sup>*d*<sup>t</sup>* <sup>+</sup> *<sup>i</sup> pi* ◦ <sup>d</sup>*W<sup>i</sup> <sup>t</sup>* <sup>+</sup> *<sup>I</sup> pI* ◦ <sup>d</sup>*B<sup>I</sup> <sup>t</sup>* . Thus, writing **m** = **u** + **R** and expanding d*χ* in terms of **v** and *ξ <sup>i</sup>*, we find that Eq. (8) becomes:

$$\begin{aligned} \mathbf{d}\mathbf{u} + \left[\nabla \cdot (\mathbf{v}\mathbf{u}) + f\hat{\mathbf{z}} \times \mathbf{v} + \nabla p + g(1+b)\hat{\mathbf{z}}\right] \mathbf{d} \\ + \sum\_{i} \left[\nabla \cdot \left(\boldsymbol{\xi}\_{i}\mathbf{u}\right) + f\hat{\mathbf{z}} \times \boldsymbol{\xi}\_{i} + \nabla \xi\_{i} \cdot \mathbf{u} + \nabla \left(p\_{i} + \boldsymbol{\xi}\_{i} \cdot \mathbf{R}\right)\right] \circ \mathbf{d}W\_{i}^{I} \\ - \sum\_{I} \left[\nabla \cdot \left(\mathbf{v}\boldsymbol{\phi}\_{I}\right) - \nabla \boldsymbol{\phi}\_{I} \cdot \mathbf{v} - \nabla \left(p\_{I} - \mathbf{v} \cdot \boldsymbol{\phi}\_{I}\right)\right] \circ \mathbf{d}B\_{I}^{I} = 0 \,\end{aligned} \tag{9}$$

The first line of Eq. (9) contains the terms of the deterministic momentum equation, the second line contains the SALT terms and the final line contains the SFLT contributions. Equation (9) is a three-dimensional equation, but the third component is the (diagnostic) hydrostatic balance condition rather than a prognostic evolution equation for *w*. In the cases of SALT and SFLT hydrostatic balance includes additional constraints on the stochastic parts of the pressure d*P*:

$$\frac{\partial p}{\partial z} = -g(\mathbf{l} + b) \,, \qquad \frac{\partial p'\_l}{\partial z} = -\frac{\partial \boldsymbol{\xi}\_l}{\partial z} \cdot \mathbf{u} \,, \qquad \frac{\partial p'\_I}{\partial z} = -\frac{\partial \boldsymbol{\Phi}\_I}{\partial z} \cdot \mathbf{v} \,, \tag{10}$$

where we have the definitions of the shifted stochastic pressure terms *p <sup>i</sup>* := *pi* + *ξ <sup>i</sup>* · **R** and *p <sup>I</sup>* := *pI* − **v** · *φ<sup>I</sup>* . We solve Eq. (10) by imposing the following surface pressure boundary conditions:

$$p|\_{\varepsilon=0} = \mathbf{g}\,\eta \,, \qquad p'\_l|\_{\varepsilon=0} = \psi\_l \,, \qquad p'\_I|\_{\varepsilon=0} = \psi\_I \,\, , \tag{11}$$

where *η* is the free surface height. The boundary condition on *p* is that used in the linear free surface approximation, which is employed in FESOM2 [DSWJ16]. *ψi* and *ψI* are functions only of the horizontal direction and are arbitrary. They may be used to introduce some stochastic atmospheric forcing at the ocean surface, but we do not consider this in the present work. For simplicity we shall set *ψi* = *ψI* = 0 for all *i, I* . Solving Eq. (10) with the boundary conditions in Eq. (11) gives us the following:

$$p = g(\eta - z) + g \int\_z^0 b dz',\tag{12a}$$

$$p'\_l = \psi\_l + \int\_z^0 \frac{\partial \xi\_l}{\partial z} \cdot \mathbf{u} dz' \,, \tag{12b}$$

$$p'\_I = \psi\_I + \int\_z^0 \frac{\partial \Phi\_I}{\partial z} \cdot \mathbf{v} dz'. \tag{12c}$$

A more exact condition on the deterministic pressure would be *p*|*z*=*<sup>η</sup>* = 0. Using this gives almost the same result for *p* except that the upper limit of the integral will instead be *η*.

The equation for the evolution of the free surface height *η* is obtained by integrating the incompressibility condition and using appropriate surface boundary conditions. For the linear free surface approximation we take *w*|*z*=0d*t* = d*η*; at the bottom boundary *z* = −*H (x, y)* we have d*χ*|*z*=−*<sup>H</sup>* · ∇ *(z* + *H)* = 0. Thus, integrating the incompressibility condition in the vertical direction from *z* = −*H* to *z* = 0 we find, in the linear free surface case:

$$\mathbf{d}\eta + \nabla \cdot \int\_{-H}^{0} \mathbf{u}dt \, dz = 0 \,. \tag{13}$$

Again, the more exact boundary condition would be d*χ*|*z*=*<sup>η</sup>* · ∇ *(z* − *η)* = d*η* ad in this case Eq. (13) is modified by **<sup>u</sup>**d*<sup>t</sup>* <sup>→</sup> **<sup>u</sup>**d*<sup>t</sup>* <sup>+</sup> *<sup>i</sup> <sup>ξ</sup> <sup>i</sup>* ◦ <sup>d</sup>*W<sup>i</sup> <sup>t</sup>* and the upper limit of the integral will be *η* rather than 0. However, for our numerical simulations we use the linear free surface.

From Eqs. (5e) and (5f) we have the advection equations:

$$\mathbf{d}T + \mathbf{v} \cdot \nabla T \mathbf{d}t + \sum\_{l} \boldsymbol{\xi}\_{l} \cdot \nabla T \diamond \mathbf{d}W^{l}\_{l} = \mathbf{0},\tag{14}$$

$$\mathbf{d}\mathbf{S} + \mathbf{v} \cdot \nabla S \mathbf{d}t + \sum\_{i} \boldsymbol{\xi}\_{i} \cdot \nabla S \diamond \mathbf{d}W^{i}\_{i} = \mathbf{0},\tag{15}$$

for temperature and salinity respectively. The horizontal component of the momentum equation (9), along with the solutions Eq. (10) for pressure (with the equation of state *b* = *b(T , S, z)*), the incompressibility conditions Eq. (7), the tracer advection equations (14) and (15) and the linear free surface equation (13) give us a complete set of fluid equations, the Primitive Equations with SALT and SFLT.

#### *2.2 Conservation Laws*

The key benefit of the SALT and SFLT frameworks is that they retain some of the fundamental conservation properties possessed by the deterministic equations. By writing the Primitive Equations in the geometric form given in Eqs. (5d)–(5f) and (8), we may demonstrate the effect of the stochastic frameworks on these conservation laws. First, we consider energy conservation. The total energy is equal to the Hamiltonian, as given in Eq. (2). For convenience of notation, we define *h(*˜ **<sup>m</sup>***, D, T , S, w)* <sup>=</sup> *h(***m***h,D, T , S)* <sup>+</sup> *<sup>m</sup>*3*, w*. *<sup>h</sup>* and *<sup>h</sup>*˜ are equal on solutions of the equations, but we have *δh*˜ *<sup>δ</sup>***<sup>m</sup>** = **v**. By direct calculation, the time evolution of the energy is given by:

$$\begin{split} \mathbf{d}h &= \sum\_{i} \left[ \left< \frac{\delta \tilde{h}}{\delta \mathbf{m}}, \sum\_{i} \mathcal{L}\_{\xi\_{i}} \mathbf{m}^{h} \right> + \left< g(1+b), \xi\_{i}^{(\varepsilon)} \right> \right] \diamond \mathbf{d}W\_{I}^{l} \\ &\quad - \left< \frac{\delta \tilde{h}}{\delta \mathbf{m}}, (\mathbf{d} + \mathcal{L}\_{\mathbf{d}\mathcal{X}}) \mathbf{F} \right> . \end{split} \tag{16}$$

Thus, the energy conservation property is violated by the stochastic terms. The two terms on the right-hand side of the pairing in Eq. (16) come from SALT and SFLT respectively. However, as shown in [HH21], the energy deviation from SFLT can be nullified by choosing d + Ld*<sup>χ</sup>* **<sup>F</sup>** <sup>=</sup> *<sup>I</sup>* <sup>L</sup>**v***φ<sup>I</sup>* ◦ <sup>d</sup>*B<sup>I</sup> <sup>t</sup>* for some parameters *φ<sup>I</sup> (***x***)*. Indeed, by the anti-symmetry of the vector field commutator:

$$\left\langle \frac{\delta \tilde{h}}{\delta \mathbf{m}}, \sum\_{I} \mathcal{L}\_{\mathbf{v}} \boldsymbol{\Phi}\_{I} \circ \mathrm{d}B\_{I}^{I} \right\rangle = \left\langle \left[ \frac{\delta \tilde{h}}{\delta \mathbf{m}}, \mathbf{v} \right], \sum\_{I} \boldsymbol{\Phi}\_{I} \circ \mathrm{d}B\_{I}^{I} \right\rangle = 0 \,,\tag{17}$$

where the square bracket [·], denotes the commutator of vector fields. Thus, energy conservation is broken by SALT but preserved by a class of stochastic forcing in SFLT. In the remainder of the paper, we shall assume the stochasticity introduced by SFLT are in the energy preserving form.

The next conservation law we consider is the Kelvin circulation theorem. The evolution of the circulation corresponding to Eq. (8) is given by:

$$\mathbf{d} \oint\_{C(t)} \frac{\mathbf{m}^h}{D} \cdot d\mathbf{x} = -\mathbf{g} \oint\_{C(t)} b(T, S, z) dz \,\mathrm{d}t + \sum\_I \oint\_{C(t)} (\operatorname{curl} \boldsymbol{\phi}\_I \times \mathbf{v}) \cdot d\mathbf{x} \,\mathrm{d}B\_t^I,\tag{18}$$

where *C(t)* is a closed loop moving with the transport velocity d*χ*. We see that SALT affects the circulation theorem only by modifying the advection of the loop; thus the circulation theorem for SALT is the same as in the deterministic case, but with the circulation considered around a stochastically-transported loop. Therefore, circulation is generated only by buoyancy gradients being misaligned with the vertical direction. In SFLT, on the other hand, there are additional forces introduced, which generate the circulation of fluid momentum.

The evolution of potential vorticity associated with Eq. (8) can be expressed as

$$\left(\mathbf{d} + \mathbf{d}\boldsymbol{\chi} \cdot \nabla\right)q = \frac{1}{D}\boldsymbol{\omega} \cdot \nabla\left(\frac{\partial b}{\partial z}\mathbf{d}\boldsymbol{\chi}^{(\boldsymbol{\varepsilon})}\right) + \frac{1}{D}\nabla b \cdot \sum\_{I} \left[\nabla \cdot (\mathbf{v}\boldsymbol{\omega}\_{I}) - \boldsymbol{\omega}\_{I} \cdot \nabla \mathbf{v}\right] \diamond \mathbf{d}\boldsymbol{B}\_{I}^{I},\tag{19}$$

where *<sup>ω</sup>* := curl **m***h/D* is the relative vorticity, *ω<sup>I</sup>* = curl *φ<sup>I</sup>* is the stochastic vorticity generated by SFLT and *<sup>q</sup>* := <sup>1</sup> *<sup>D</sup> ω* · ∇*b* is the potential vorticity. Similar to the Kelvin circulation theorem, SALT introduces stochasticity in the transport velocity d*χ*, while SFLT introduces stochastic forces that act on the advection of fluid potential vorticity. If we assume that the buoyancy has no explicit dependence on the vertical coordinate, i.e. *∂b ∂z* = 0, then *q* is purely advected by the flow in the absence of SFLT.

#### **3 Calibration of the Stochastic Parameters**

#### *3.1 Lagrangian Paths*

In order to calibrate the parameters *ξ <sup>i</sup>* used in SALT we propose to use the method of Lagrangian paths introduced in [CCH+19, CCH+20].

First, we perform a fine-grid model run, which we shall take to be the 'truth'. Resulting from this run we get an output velocity **v***(***x***,t)* saved at times *t* ∈ {*t*1*,...,tN*−<sup>1</sup>*, tN* }, where the time interval between subsequent sample times, *ti*+<sup>1</sup> − *ti*, is greater than the velocity decorrelation time, defined to be the smallest *τ* at which the auto-correlation function *C(τ )* is less than *e*−1. Suppose the finegrid resolution is *M* times that of the coarse grid, in which case the coarse-grid time step is given by *Δtc* := *MΔtf* , where *Δtf* is the time step for the finegrid model run. In order to compute Lagrangian paths we also save **v***(***x***,t)* at *<sup>t</sup>* <sup>∈</sup> *ti, ti* + *Δtf ,...,ti* + *(M* − 1*)Δtf* for each *i* = 1*,...,N*.

To obtain the corresponding coarse-grid velocity **v**¯*(***x**¯*,t)* from **v***(***x***,t)*, we apply a coarse-graining operator to **v***(***x***,t)*, which consists of a local average over finegrid points, to obtain a velocity **v**¯*(***x**¯*,t)* defined on the coarse grid. Considering a distribution of tracer particles whose initial positions **x***<sup>r</sup>* <sup>0</sup> are the (three-dimensional) coordinates of the coarse-grid nodes (enumerated by *r*), we compute Lagrangian paths on the fine grid and coarse grid respectively:

$$\mathbf{x}\_f^r \left( t\_l + M \Delta t\_f \right) := \mathbf{x}\_0^r + \sum\_{m=0}^{M-1} \mathbf{v} \left( \mathbf{x}\_f^r \left( t\_l + m \Delta t\_f \right), t\_l + m \Delta t\_f \right) \Delta t\_f \,, \tag{20a}$$

$$\mathbf{x}\_c^r \left( t\_l + M \,\Delta t\_f \right) := \mathbf{x}\_0^r + \bar{\mathbf{v}} \left( \mathbf{x}\_c^r(t\_l), t\_l \right) \,\Delta t\_c \,, \tag{20b}$$

where **x** *f r ti* + *MΔtf* and **x***<sup>r</sup> c ti* + *MΔtf* are the Lagrangian paths computed as integral curves of **v***<sup>f</sup>* and **v**¯ respectively; the integral is carried out over one coarsegrid time-step, which is equivalent to *M* fine-grid time steps. We can then define the difference *<sup>Δ</sup>***x***r,i* <sup>=</sup> *<sup>Δ</sup>***x***(ti,* **<sup>x</sup>***<sup>r</sup>* <sup>0</sup>*)* := **<sup>x</sup>***<sup>r</sup> f tr* + *MΔtf* <sup>−</sup> **<sup>x</sup>***<sup>c</sup> tr* + *MΔtf* and apply the method of [HJS07] to compute the Empirical Orthogonal Functions (EOFs). To summarise, we subtract off the time mean to define *Δ***x** *r,i* := *<sup>Δ</sup>***x***r,i*<sup>−</sup> <sup>1</sup> *N N*−1 *<sup>i</sup>*=<sup>0</sup> *<sup>Δ</sup>***x***r,i*. In the *x*-direction we then have a matrix with components *Δx r,i*. From this we construct the matrix *Λ(x)* which has components *Λ(x) rs* <sup>=</sup> <sup>1</sup> *N N*−1 *<sup>i</sup>*=<sup>0</sup> *Δx r,iΔx s,i*. The EOFs in the *x*-direction are then defined to be the eigenvectors of the matrix *Λ(x)* which we denote as *a(x) <sup>i</sup>* , for *i* = 1 *...N*. They are normalised in the sense that **<sup>x</sup>** *<sup>a</sup>(x) <sup>i</sup> (***x***)a(x) <sup>j</sup> (***x***)* = *δij* , where the sum is over all grid points. We apply the same process to the *y*-component *Δy r,i* to obtain *N* eigenvectors in the *y*-direction, which we denote *a(y) <sup>i</sup>* . We do not compute the eigenvectors for the *z*-direction since these will be obtained from the incompressibility condition.

We remark that the method we have used here, in which we compute the EOFs of each component of *Δ***x** separately, is different from the method found in other sources [e.g. HLB96], in which the components are computed together and we obtain a set of two-component eigenvectors **a***<sup>i</sup>* immediately, with one eigenvalue *λi* corresponding to each of these EOFs. However, this method was attempted for SALT runs in the current set-up and the results of model runs were less successful. For this reason we have chosen to compute the components separately.

Thus, in our case we have *N* eigenvectors in each of the horizontal directions and these will have associated eigenvalues *λ(x) <sup>i</sup>* and *<sup>λ</sup>(y) <sup>i</sup>* . We define the horizontal components of *ξ <sup>i</sup>* by a re-scaling of these eigenvectors. The magnitude of the eigenvalue *λ(x) <sup>i</sup>* gives an indication of how much of the variance is captured by the corresponding eigenvector. Therefore, we choose to scale the parameters so that <sup>6</sup> *ξ (h) <sup>i</sup> , <sup>ξ</sup> (h) i* 7 ∝ *λi*. Moreover, in order to ensure that the different methods for computing *ξ <sup>i</sup>* may be compared fairly, we require that the *L*2-norm of the sum be the same for each method. Thus we impose the following:

$$\frac{1}{N\_{\rm tot}} \sum\_{i=1}^{N\_{\rm \xi}} \left< \boldsymbol{\xi}\_{i}^{(h)}, \boldsymbol{\xi}\_{i}^{(h)} \right> = \boldsymbol{\nu}^{\,2} \tag{21}$$

where *γ* is a constant with units *ms*−1*/*2, which we shall choose later; *Vtot* is the total volume of the domain. and *Nξ* ≤ *N* is the number of EOFs we choose to keep for our model runs. The total integral, denoted by angle brackets, is defined by **<sup>a</sup>***,* **<sup>b</sup>** := **<sup>x</sup> a***(***x***)* · **b***(***x***)V (***x***)*. We can achieve the required properties by choosing the following scaling:

$$
\xi\_{l}^{(\chi)}(\mathbf{x}) = \sqrt{\frac{\lambda\_{l}^{(\chi)}}{\lambda\_{Io}}} \cdot \frac{V\_{\text{tot}}}{V(\mathbf{x})} a\_{l}^{(\chi)}(\mathbf{x}) \tag{22}
$$

where *V (***x***)* is the volume of the grid cell located at **x** and we have defined *λtot* := *Nξ <sup>i</sup>*=1*(λ(x) <sup>i</sup>* <sup>+</sup> *<sup>λ</sup>(y) <sup>i</sup> )*. After computing the horizontal components in this way, *<sup>ξ</sup> (h) <sup>i</sup>* are then smoothed to zero near the boundaries in order to enforce the impermeability condition at the boundary, *ξ (h) <sup>i</sup>* · **n** = 0, where **n** is the normal to the boundary and *ξ (h) <sup>i</sup>* <sup>=</sup> *(ξ (x) <sup>i</sup> , ξ (y) <sup>i</sup> )* is the horizontal part of *<sup>ξ</sup> <sup>i</sup>* <sup>=</sup> *(ξ (x) <sup>i</sup> , ξ (y) <sup>i</sup> , ξ (z) <sup>i</sup> )*.

For the *z*-component we use the incompressibility condition Eq. (7) along with the impermeability condition *ξ <sup>i</sup>* · ∇ *(z* + *H)* = 0 at the lower boundary *z* = −*H* to obtain:

$$
\xi\_l^{(\varepsilon)} = -\nabla^{(h)} \cdot \int\_{-H}^{\varepsilon} \xi\_l^{(h)} dz \,, \tag{23}
$$

where <sup>∇</sup>*(h)* <sup>=</sup> *( <sup>∂</sup> ∂x , <sup>∂</sup> ∂y )* is the horizontal gradient. This method for computing the vertical component of *ξ <sup>i</sup>* is applicable to any system of fluid equations with an incompressibility condition. We could, alternatively, compute all three components of *ξ <sup>i</sup>* as EOFs of the three components of *Δ***x**. However, the resulting threecomponent vector *ξ <sup>i</sup>* will not be guaranteed to be divergence-free. We would then need to subtract off the divergent part *ξ <sup>i</sup>* → *ξ <sup>i</sup>* <sup>=</sup> *<sup>ξ</sup> <sup>i</sup>* − ∇*Δ*−<sup>1</sup> ∇ · *ξ <sup>i</sup>* where *Δ*−<sup>1</sup> is the inverse Laplacian. However, computing the divergent part of the vector *ξ <sup>i</sup>* is computationally expensive; moreover, the components of *ξ <sup>i</sup>* computed in this way will not be guaranteed to be orthogonal with respect to ·*,* ·. Thus in this paper we consider only the *ξ <sup>i</sup>* for which the vertical components are computed from integrating the incompressibility condition.

#### *3.2 Eulerian Differences*

To calibrate the parameters *φ<sup>I</sup>* used in SFLT we propose an alternative method by using differences in fixed Eulerian coordinates. Consider the deterministic momentum equation given by:

$$(\mathbf{d} + \mathcal{L}\_{\text{vdr}}) \left(\mathbf{m}^h\right) = -\left(p + \frac{\delta h}{\delta D}\right) \diamond D \mathbf{d}t - \frac{\delta h}{\delta T} \diamond T \, \text{d}t - \frac{\delta h}{\delta S} \diamond \text{Sdt}\,,\tag{24}$$

and the SFLT equation:

$$(\mathbf{d} + \mathcal{L}\_{\bar{\mathbf{v}}\mathbf{d}t}) \left(\bar{\mathbf{m}}^h\right) - \sum\_I \mathcal{L}\_{\bar{\mathbf{v}}} (\mathbf{d}\Phi) = -\left(p + \frac{\delta h}{\delta \bar{D}}\right) \diamond \bar{D}\mathbf{d}t - \frac{\delta h}{\delta \bar{T}} \diamond \bar{T}\,\mathrm{d}t - \frac{\delta h}{\delta \bar{S}} \diamond \bar{S}\,\mathrm{d}t,\tag{25}$$

where the notation ¯*(*·*)* are used on the variables of the SFLT equations to emphasise the difference between deterministic and stochastic variables. The goal of the stochastic parameterisation is to decompose the "true" fluid flow to a slow drift component and a rapid fluctuating component whose amplitude can be estimated from data. In the example of estimating the momentum fluctuation <sup>d</sup>*<sup>Φ</sup>* of **<sup>m</sup>**¯ *<sup>h</sup>*, we denote the slow drift component as **<sup>m</sup>**¯ *<sup>h</sup>* and we seek the solution to the minimisation problem

$$\min\_{\mathbf{d}\Phi} \mathbb{E}\left[ \left| \mathbf{d} \mathbf{m}^h - \mathbf{d} \bar{\mathbf{m}}^h \right|^2 \right]. \tag{26}$$

Assuming *D, T* and *S* do not have rapidly fluctuating components, the minimisation problem becomes

$$\min\_{\mathbf{d}\Phi} \mathbb{E}\left[ \left| \mathcal{L}\_{\mathbf{v}}(\mathbf{m}^{h}\mathbf{d}t) - \mathcal{L}\_{\tilde{\mathbf{v}}}(\mathbf{\bar{m}}^{h}\mathbf{d}t - \mathbf{d}\Phi) \right|^{2} \right]. \tag{27}$$

We see that this minimisation problem can be solved by taking <sup>d</sup>*<sup>Φ</sup>* <sup>=</sup> **<sup>m</sup>***<sup>h</sup>* <sup>−</sup> **<sup>m</sup>**¯ *<sup>h</sup>* d*t* = *(***u** − **u**¯*)* d*t*. Therefore, we define the differences

$$
\Delta \mathbf{x}\_{r,I} := \Delta \mathbf{x}(t\_I, \mathbf{x}\_0^r) = \left[ \mathbf{u}(t\_I, \mathbf{x}\_0^r) - \bar{\mathbf{u}}(t\_I, \mathbf{x}\_0^r) \right] \Delta t\_c \tag{28}
$$

for *<sup>I</sup>* <sup>=</sup> <sup>1</sup>*,...,N*. We then assume the expansion <sup>d</sup>*<sup>Φ</sup>* <sup>=</sup> *<sup>N</sup> <sup>I</sup>*=<sup>1</sup> *<sup>φ</sup><sup>I</sup>* ◦ <sup>d</sup>*B<sup>I</sup> <sup>t</sup>* . As before, we subtract the time-mean to obtain *Δ***x** *r,I* <sup>=</sup> *<sup>Δ</sup>***x***r,I* <sup>−</sup> <sup>1</sup> *N N*−1 *<sup>I</sup>*=<sup>0</sup> **<sup>x</sup>***r,I* and then compute the EOFs exactly as we did in Sect. 3.1 to get our parameters *φ<sup>I</sup>* .

In both methods we initially compute horizontal components of the stochastic parameters using EOFs, but for SALT there is the additional step of integrating the incompressibility condition to obtain the vertical component. The vertical component is not needed for *φ<sup>I</sup>* since it is a part of the decomposition of the fluid momentum **m**, the vertical part of which vanishes in the Primitive Equations. In fully three-dimensional models in which the vertical component of the momentum is non-zero, the Eulerian differences of the momenta will be a three-dimensional object and one can compute all three components of the parameters *φ<sup>I</sup>* using EOFs.

We can also consider using Eulerian differences as an option for *ξ <sup>i</sup>* in SALT. This effectively means approximating the fine-grid Lagrangian path by taking only one time-step in the coarse grid: **x***<sup>f</sup>* ≈ **v***<sup>f</sup> (***x**0*, t)Δtc*. We can expect that this will be a reasonable approximation for small *M*, but for larger *M* the Lagrangian paths method will diverge from the Eulerian differences. In our numerical investigations in SALT we shall consider *ξ <sup>i</sup>* computed from both the Lagrangian paths method and Eulerian differences method. For SFLT we also consider *φ<sup>I</sup>* computed from Lagrangian paths (for completeness) as well as those computed by Eulerian differences as described above.

#### **4 Results**

We solve the Primitive Equations using the FESOM2 code on a rectangular domain [0*,* 40◦]×[30◦*,* 60◦]×[0*,* −*H*], where *H* = 1600m is the depth of the domain and the bathymetry is flat. Impermeability conditions are imposed at all boundaries. The model is spun up for three years from zero initial velocity and an initial temperature profile given by *T (z)* <sup>=</sup> *<sup>T</sup>*<sup>0</sup> <sup>+</sup> *<sup>λ</sup> αρ*0 *(*<sup>1</sup> <sup>−</sup> *β)*tanh *<sup>z</sup> z*0 <sup>+</sup> *<sup>β</sup> <sup>z</sup> H* , which is based on the test case described in [RDH+12, SDB+16]. We take *T*<sup>0</sup> = 25◦*C*, *β* = 0*.*05, *<sup>λ</sup>* <sup>=</sup> 5kgm−3, *<sup>z</sup>*<sup>0</sup> <sup>=</sup> 300m, *<sup>ρ</sup>*<sup>0</sup> <sup>=</sup> 1030kgm−3, and *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*00025K−1. For simplicity, salinity is kept constant and we use a linear equation of state which depends only on temperature: *b* = −*α(T* − 10◦*C)*. The flow is driven by a wind forcing in the upper layer given by *<sup>τ</sup> (x, y)* <sup>=</sup> <sup>−</sup>*τ*0*Δz*<sup>0</sup> *<sup>ρ</sup>*<sup>0</sup> cos *πy* 15◦ **x**ˆ, where *Δz*<sup>0</sup> = 10m is the thickness of the upper layer; *<sup>τ</sup>*<sup>0</sup> <sup>=</sup> <sup>0</sup>*.*2ms−<sup>2</sup> is the wind strength. The vertical discretisation consists of 23 layers, with layer thicknesses increasing with depth. For the horizontal discretisation we take a fine grid of spacing 1*/*4◦ and a coarse grid of spacing 1*/*2◦. At the latitudes we are considering, 1*/*4◦ corresponds to an eddy-permitting model, while 1*/*2◦ may be considered non-eddy resolving [see Hal13]. We run the deterministic model on the fine grid and the coarse grid, and carry out the SALT and SFLT runs on the coarse grid. All coarse-grid runs are begun from the same initial condition, being the final time snapshot after the three-year spin-up period; the fine-grid run is begun from the end of the three-year spin-up on the fine grid. We save data in each case at intervals of 15 days, over a time period of 10 years, for a total of 240 snapshots. From the fine grid data we have the 'truth' velocity **v***<sup>f</sup>* . To this we apply a coarse-graining **v**¯; we then follow the procedures outlined in Sect. 3 to compute *ξ (h) <sup>i</sup>* and *φ<sup>I</sup>* . However, there is no canonical choice for how the coarse-graining should be done. We consider a filter defined by an equally-weighted nine-point average over nearest neighbours, and we denote this filter F; this filter, applied once, has a width equal to the spacing on the coarse grid, i.e. 1*/*2◦. The coarse-graining will then be done by applying this filter *Nf ilt* times successively, then projecting onto the coarse grid. Thus, the smoothing filter applied *Nf ilt* times will be denoted <sup>F</sup>*Nf ilt* ; this has a width *Nf ilt /*2 degrees with a stronger weighting for points closer to the centre of the filter. We consider the cases *Nf ilt* = 1*,* 4*,* 32.

From the deterministic model run, we have velocities saved at 240 time snapshots, so we can use these to compute 240 EOFs. We do this for both the Lagrangian paths method and the Eulerian differences method, for each of the three choices of *Nf ilt* ; this gives a total of six sets of parameters. In our model runs we shall choose to keep *Nξ* = *Nφ* = 32 of these parameters for each run. In Fig. 1 we plot the square-root of the sum of the squares of these parameters (before re-scaling by *γ* ) as a field in space. From Fig. 1 it appears the differences between Lagrangian paths or Eulerian differences are minimal. We remark that here the time-steps on the fine and coarse grids differ only by a factor of 2; it is expected that if a bigger difference in resolution is used, then more steps will be needed in computing the Lagrangian paths and therefore the corresponding parameters will differ more substantially. The number of times we apply the smoothing operator, however, has a much greater effect and we see significantly different fields with *Nf ilt* = 32 than we do with *Nf ilt* = 4 or *Nf ilt* = 1. Indeed, it appears from Fig. 1 that the weaker filter causes the parameters to be more strongly concentrated around the western boundary, whereas for the stronger filter the parameters are spread more across the domain.

The cumulative spectra of the EOFs are shown in Fig. 2. These spectra show us how many EOFs are needed to capture a given percentage of the total variability; or conversely, how much variance is captured by a given number of EOFs. We show in each case how much variability is captured by using 32 EOFs. In all cases the Lagrangian paths method gives a slightly higher variability captured, though

**Fig. 1** <sup>1</sup> *γ* <sup>1</sup> *Nξ Nξ <sup>i</sup>*=<sup>1</sup> *<sup>ξ</sup> (h) <sup>i</sup>* · *<sup>ξ</sup> (h) i* 1*/*2 in the upper fluid layer for different methods of computing *ξ (h) <sup>i</sup>* . Top row: *<sup>ξ</sup> (h) <sup>i</sup>* computed from Lagrangian paths for different strengths of smoothing filter. Bottom row: *ξ (h) <sup>i</sup>* computed from Eulerian differences for different strengths of smoothing filter

**Fig. 2** Eigenvalue spectra of zonal *ξ <sup>i</sup>*, plotted for three different values for *Nf ilt* . On each panel is shown the spectrum for the EOFs calculated by Lagrangian trajectories and Eulerian differences. The horizontal lines show what the percentage of the total variance is captured by choosing *N<sup>ξ</sup>* = 32 EOFs

the difference is small, especially for the smaller values of *Nf ilt* . A much bigger variability is captured, however, in the *Nf ilt* = 32 case when compared with the *Nf ilt* = 1 case.

We implemented SALT and SFLT into FESOM2 (see Appendix section) and ran the model with each choice of parameters and with the appropriate re-scaling as detailed above. For all SALT runs we use *Nξ* = 32 with the scaling *γ* = 2 × <sup>10</sup>−3*ms*−1*/*2. For SFLT we also take *Nφ* <sup>=</sup> 32 but scale the parameters with *<sup>γ</sup>* <sup>=</sup> 102*ms*−1*/*2. This re-scaling is chosen empirically taking *γ* to be the largest value possible that will not result in model blow-up. It appears that the magnitude of parameters that we are able to use for SFLT is much higher. This is possibly due to the fact that SFLT does not involve any direct modification of the tracer equation. SALT, on the other hand, includes an advection of the temperature by the stochastic transport velocity; using higher values for this velocity may destabilise the tracer equation and cause model blow-up.

The results of these runs are shown in Figs. 3, 4, 5. Figure 3 shows the eddy kinetic energy (EKE), defined by *<sup>E</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup> <sup>|</sup>**<sup>u</sup>** <sup>−</sup> **<sup>u</sup>**|2, where **<sup>u</sup>** is the time-averaged velocity. We notice that the eddy kinetic energy is significantly less in the coarsegrid deterministic run than it is in the fine-grid run. This is probably due to the fact that small scales are less present in the coarse-grid flow, and in the coarse-grid model the viscosity used is greater and so kinetic energy is dissipated at a faster rate. However, when we include SALT there is, for most choices of *ξ <sup>i</sup>*, a notable increase in EKE across the domain, particularly around the western boundary. The exception is in the cases in which the coarse-grained velocities **v**¯ used to calculate *ξ <sup>i</sup>* are defined with only one application of the smoothing operator, as shown in panels (c) and (d) in Fig. 3. This could be because, from Fig. 2, the inclusion of 32 *ξ <sup>i</sup>* captures a smaller amount of the total variability; it may also be that the effect of

**Fig. 3** Time-average of eddy kinetic energy at depth 16 m below the surface. Panel (**a**) is from the high-resolution (1*/*4◦) deterministic model, while (**b**) is from the low-resolution (1*/*2◦) deterministic model. Panels (**c**), (**g**), (**k**) are the results of model runs at 1*/*2◦ with SALT, where *ξ <sup>i</sup>* are computed using Lagrangian differences using a coarse velocity defined by applying the smoothing filter 1, 4 and 32 times respectively. Panels (**d**), (**h**), (**l**) are also SALT runs but *ξ <sup>i</sup>* are computed from Eulerian differences rather than Lagrangian trajectories. Panels (**e**), (**i**), (**m**) are SFLT runs with *φ<sup>I</sup>* computed from Lagrangian trajectories, while (**f**), (**j**), (**n**) have *φ<sup>I</sup>* computed from Eulerian differences

**Fig. 4** Spectra of eddy kinetic energy for SALT (left panel) with *ξ <sup>i</sup>* calculated from Lagrangian paths and from Eulerian differences; and for SFLT (right panel) with *φ<sup>I</sup>* calculated from Lagrangian paths and from Eulerian differences. Also included in each plot are the spectra for the deterministic runs on the fine and coarse grids. Spectra are calculated in the *x*-direction at fixed *<sup>y</sup>* <sup>=</sup> <sup>45</sup> <sup>1</sup> 6 ◦ by *E(k)* <sup>ˆ</sup> := <sup>1</sup> *tmax* ! *tmax* 0 ! *E(x, t)eikx dx* d*t*. Here *tmax* = 10years and *t* = 0 corresponds to the beginning of the model run, after spin-up

**Fig. 5** Vertical profiles of temperature horizontally-averaged across the domain after 10 years of model time. The left-hand panel shows the results from the SALT runs, alongside the deterministic runs. The right-hand panel shows the results from the SFLT runs, alongside the deterministic runs

the *ξi*s is more spread out across the domain, as shown in Fig. 1, which overall has a greater impact than having them more highly concentrated in one region. For SFLT there is only a modest improvement in the EKE field, and the effect is similar for all choices of the parameters. In all cases there appears to be little difference between the Eulerian differences method and the Lagrangian paths method when the same *Nf ilt* is used.

We can also consider the spatial spectra, as shown in Fig. 4. There we see that the 1*/*4◦ run contains higher EKE at all scales than the low-resolution run. Every SALT run succeeds in increasing the energy at almost all scales and in shifting the

**Fig. 6** Time series of spatially-averaged temperature fields for SALT runs at *z* = −5 m (left panel) and *z* = −1350 m (right panel)

spectrum towards that of the 1*/*4◦ run. The most significant improvements are seen in the run with Eulerian parameters computed with *Nf ilt* = 32; in contrast, there is only a small change from the deterministic run when the *Nf ilt* = 1 Eulerian parameters are used. For SFLT the improvement is again less noticeable, with all choices of parameters only giving a slight increase in EKE at all scales.

Since we are working with the Primitive Equations, the buoyancy can have a large effect on the fluid flow. We therefore consider the temperature, which determines buoyancy directly via the linear equation of state. Figure 5 shows vertical temperature profiles at the end of the ten-year run. In the coarse-grid model there is a slightly lower average temperature in the upper layers of the fluid, and slightly higher temperatures in the lower layers. However, with SALT included there is, for some choices of parameters, a significant reduction in temperature in the upper layers, while at lower depths the temperature increases relative to the deterministic model. Considering the time series of spatially averaged temperature at *z* = −5 m and *z* = −1350 m in Fig. 6, we see the downwards diffusion effects are persistent in time. In the deterministic case we see that the coarse-grid model has a stronger downwards diffusion of temperature than the fine-grid run. The inclusion of SALT also accelerates this downward-diffusion effect. It therefore appears that the calibrated stochastic terms we have included in the temperature equation with SALT cause a downwards-diffusion effect. Indeed, an additional SALT run (not shown), in which the stochastic terms were not included in the temperature advection, did not display this downwards diffusion behaviour. Thus, further investigation will be required in order to determine how to avoid the excessive downwards diffusion in the tracer equation while maintaining a positive effect on the EKE field. SFLT has very little effect on the temperature field when compared to that of the low-resolution model. This is expected however since there are no direct stochastic effects in the temperature equation. Comparing with SALT runs where the temperature downwards-diffusion effect is present against the SFLT runs, we believe that the temperature is the dominant force for the evolution of velocity, at least at the resolutions we have considered here. Then, the limited effects on EKE by the SFLT framework are explained as it does not affect the driving temperature fields directly. It remains part of future work to consider the case where SFLT is added to the temperature field.

#### **5 Summary and Discussion**

This work lays the groundwork for the application of two relatively new stochastic parameterisation frameworks to the Primitive Equations. The first, SALT, has hitherto only been applied to simple idealised ocean models such as QG and 2D Euler. The second, SFLT, had not been investigated numerically prior to the present work. We have demonstrated some of the desirable theoretical properties of the stochastic Primitive Equations with the noise added in these ways. Notably, the preservation of a circulation theorem for SALT and energy conservation for SFLT. We have proposed to calculate the parameters *ξ <sup>i</sup>* governing SALT and *φ<sup>I</sup>* governing SFLT by two different methods: Lagrangian paths and Eulerian differences. We find that there are no significant differences between the two methods, either in the parameters themselves or in the results of model runs. In this case it is preferable to use the Eulerian differences method, as the parameters in this case are computationally less expensive to compute. However, we have used a set-up in which the fine-grid resolution is only is only 2 times the coarse-grid resolution. However, using a larger ratio of grid resolutions would mean more time-steps are needed in the Lagrangian paths and so may give different EOFs that differ more significantly than what we have observed here. We do observe, however, that there are large sensitivities to the choice of smoothing used in defining the coarse-grained velocity, from which the parameters are calculated. In the SALT case, the model runs using parameters calculated with a strong smoothing filter show a significant improvement in the eddy kinetic energy field at all depths, as well as in the eddy kinetic energy spectrum. In the SFLT case, the improvement in EKE field and EKE spectrum are more modest compared to the improvement by SALT due to the lack of direct stochastic effects to the driving temperature fields. Considering the temperature profile, however, we observe that SALT causes significant additional downward diffusion when compared with the deterministic model. It remains an open problem to devise a method to avoid this effect. The answer may lie in a different method for configuring the parameters *ξ <sup>i</sup>* or it may be the case that this is a property intrinsic to SALT. In either case, further study is needed in this direction.

The stochastic parameterisation frameworks considered in this paper distils all uncertainties of the ocean models into the stochastic parameters *ξ <sup>i</sup>* and *φ<sup>I</sup>* . However, the effects of these stochastic parameterisations could be limited by the model, both physically and numerically. Examples of the limiting factors for the Primitive Equations are the forcing from the temperature field and artificial viscosity imposed for numerical stability. The interplay between numerical effects such as artificial viscosity and stochastic parameterisation is particularly interesting for future work. This is due to different numerical viscosity are imposed at different mesh resolutions to numerical stability which influences the calibration process. Thus, we expect there are limits to the effects of SALT and SFLT for low-resolution simulations where viscosity are dominant. In high-resolution simulations, we expect to see further effects of stochasticity as the influence of viscosity diminishes. After all, the problem of stochastic parameterisations are not just model-dependent, it also dependent on the numerical method solving it.

**Acknowledgments** We are grateful to our friends and colleagues who have generously offered their time, thoughts, and encouragement in the course of this work during the time of COVID-19. Thanks to P. Berloff, C. Cotter, S. Danilov, D. Holm, S. Juricke, E. Luesink, W. Pan and all members of the Geometric Mechanics group at Imperial College for their thoughtful comments and discussions. We acknowledge the Alfred Wegener Institute for the use of their computing facilities. The authors are grateful for partial support, as follows. RH for the EPSRC scholarship (Grant No. EP/R513052/1); and SP for the EPSRC Centre for Doctoral Training in the Mathematics of Planet Earth (Grant No. EP/L016613/1). Finally, we thank the anonymous reviewer for giving useful feedback on our manuscript.

#### **Appendix: Numerical Implementation**

In order to apply SALT and SFLT to FESOM2 we adapt the time-stepping scheme to include the appropriate stochastic terms. Details of the original (deterministic) timestepping are given in [DSWJ16]. We modify the scheme from FESOM2 to a twostep Heun-type method [BBT04]; we choose this because of the use of Stratonovich integrals, to which the Heun method converges. The first step in the method is to compute the modified pressure:

$$\begin{split} \hat{p}\_h^n = \rho\_0 \mathfrak{g} \int\_{-H}^z b(T^{n+1/2}) dz' + \sum\_i \int\_{-H}^z \frac{\partial \xi\_i}{\partial z} \cdot \mathbf{u}^n dz' \frac{\Delta W\_{n+1}^l}{\Delta t} \\ &+ \sum\_I \int\_{-H}^z \frac{\partial \phi\_I}{\partial z} \cdot \mathbf{u}^n dz' \frac{\Delta B\_{n+1}^I}{\Delta t} \end{split} (29)$$

where *ΔW<sup>i</sup> <sup>n</sup>*+<sup>1</sup> and *ΔB<sup>I</sup> <sup>n</sup>*+<sup>1</sup> are independent, normally-distributed random variables with mean 0 and variance *Δt*. For the sake of conciseness we shall assume that the buoyancy depends only on temperature *T* , and that salinity is kept constant; however, extending the method to include additional tracers should be straightforward. The advective, diffusive and pressure parts of the momentum right-hand-side are then computed:

$$
\Delta \hat{\mathbf{u}}^{n+1} = \hat{\mathbf{R}}^{n+1/2} - \nabla \left( \hat{p}\_h^n + \eta^n \right) \Delta t + \mathbf{D} \left( \mathbf{u}^n, \Delta \hat{\mathbf{u}}^{n+1} \right) \tag{30}
$$

where **R**ˆ *<sup>n</sup>*+1*/*<sup>2</sup> is an Adams-Bashforth interpolation of the advective and Coriolis terms. In fact we have **<sup>R</sup>**<sup>ˆ</sup> *<sup>n</sup>*+1*/*<sup>2</sup> <sup>=</sup> 3 <sup>2</sup> + **<sup>R</sup>**<sup>ˆ</sup> *<sup>n</sup>* <sup>−</sup> 1 <sup>2</sup> + **<sup>R</sup>**<sup>ˆ</sup> *<sup>n</sup>*−1, where **<sup>R</sup>**<sup>ˆ</sup> *<sup>n</sup>* <sup>=</sup> **<sup>R</sup>** [**v***nΔt,* **<sup>u</sup>***n*] <sup>+</sup> *i* **R** *<sup>ξ</sup> i,* **<sup>u</sup>***n* − ∇*(h)<sup>ξ</sup> <sup>i</sup>* · **<sup>u</sup>***n ΔW<sup>i</sup> <sup>n</sup>*+<sup>1</sup> <sup>−</sup> *<sup>I</sup>* **<sup>R</sup> v***n, φIΔB<sup>I</sup> n*+1 and **R** [**v***,* **u**] := −∇ ·*(***vu***)*−**f**×**v**. **D** includes the horizontal and vertical diffusion terms, as well as the external wind forcing.

The change in free surface height *Δη*ˆ*n*+<sup>1</sup> is computed implicitly:

$$\left(1 - g\Delta^2 \nabla \cdot \int\_{-H}^0 \nabla \left(\cdot\right) dz\right) \Delta \hat{\eta}^{n+1} = -\nabla \cdot \int\_{-H}^0 \nabla \cdot \left(\mathbf{u}^n + \Delta \hat{\mathbf{u}}^{n+1}\right) dz \Delta t \tag{31}$$

Once this has been solved we can finally compute the stepped-forward horizontal velocity:

$$
\hat{\mathbf{u}}^{n+1} = \mathbf{u}^n + \Delta \hat{\mathbf{u}}^{n+1} - \mathbf{g} \Delta t \nabla \Delta \hat{\eta} \tag{32}
$$

Then we solve for the total layer thickness *h*¯, which in the continuous case is the same as the free surface height *η*; in the discrete case, however, they are different and we compute:

$$
\hat{\bar{h}}^{n+3/2} = \bar{h}^{n+1/2} - \nabla \cdot \int\_{-H}^{0} \hat{\mathbf{u}}^{n+1} dz \Delta t
$$

In our present set-up we then set the free-surface height as a linear interpolation of the total layer heights:

$$
\hat{\eta}^{n+1} = \theta \hat{\bar{h}}^{n+3/2} + (1 - \theta) \hat{\bar{h}}^{n+1/2} \tag{33}
$$

where *θ* ∈ [0*,* 1] is an arbitrary parameter, which we set equal to 1.

Since we have the horizontal velocity we may compute the vertical velocity:

$$
\hat{w}^{n+1} = -\nabla \cdot \int\_{-H}^{z} \hat{\mathbf{u}}^{n+1} dz' \tag{34}
$$

The newly-computed three-dimensional velocity, along with the stochastic SALT velocity, is then used to advect the tracer:

$$\hat{T}^{n+3/2} = T^{n+1/2} - R\_T \left[ T^{n+1/2}, T^{n-1/2}, \hat{\mathbf{v}}^{n+1} \Delta t + \xi\_l \Delta W\_{n+3/2} \right] + K \left[ T^{n+1/2} \right] \tag{35}$$

where *RT* denotes the advection scheme and *K* is the diffusion. From these steps we compute intermediate values *<sup>X</sup>*<sup>ˆ</sup> *<sup>n</sup>*+<sup>1</sup> := **<sup>u</sup>**<sup>ˆ</sup> *<sup>n</sup>*+1*, <sup>η</sup>*ˆ*n*+1*,* <sup>ˆ</sup> *h*¯*n*+3*/*2*, T*ˆ <sup>3</sup>*/*<sup>2</sup> from values at the previous two time steps: **u***n,* **u***n*−1*, h*¯*n*+1*/*2*, T <sup>n</sup>*+1*/*2*, T <sup>n</sup>*−1*/*2. We may write this schematically as:

$$
\hat{X}^{n+1} = X^n + \mathcal{F}\left[X^n, X^{n-1}\right] \tag{36}
$$

where F is an operator representing the computations outlined above. For the corrector step we follow the same steps as above, to compute F *X*ˆ *<sup>n</sup>*+1*, X<sup>n</sup>* and we have the overall evolution given by:

$$X^{n+1} = X^n + \frac{1}{2} \left[ \mathcal{F} \left[ X^n, X^{n-1} \right] + \mathcal{F} \left[ \hat{X}^{n+1}, X^n \right] \right] \tag{37}$$

This method differs from the usual Heun method because the right-hand side depends on the previous two time-steps, rather than just the previous one. It remains to prove that adding the stochasticity with this method does converge to the required Stratonovich integrals.

#### **Bibliography**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **A Pathwise Parameterisation for Stochastic Transport**

**Oana Lang and Wei Pan**

**Abstract** In this work we set the stage for a new probabilistic pathwise approach to effectively calibrate a general class of stochastic nonlinear fluid dynamics models. We focus on a 2D Euler SALT equation, showing that the driving stochastic parameter can be calibrated in an optimal way to match a set of given data. Moreover, we show that this model is robust with respect to the stochastic parameters.

#### **1 Introduction**

A fundamental challenge in observational sciences, such as weather forecasting and climate change predictions, is the modelling of uncertainty due, for example, to unknown or neglected physical effects, and incomplete information in both the data and the formulation of the theoretical models for prediction. Various dynamical parameterisation approaches have been proposed to tackle this challenge, see e.g. [6], [4], [11], [5], [1]. Of particular interest are the recently developed Data Driven models, that accommodate uncertainty by predicting both the expected future measurement values and their uncertainties, based on input from measurements and statistical analysis of the initial data. To effectively incorporate uncertainty in the data driven approach, such predictions are made in a probabilistic sense. Additionally, a data assimilation procedure is used to take into account the time integrated information obtained from the data being observed along the solution path during the forecast interval as "in flight corrections".

In the geoscience community, *data assimilation* (DA) refers to a set of methodologies designed to efficiently combine past knowledge of a geophysical system (in the form of a *numerical model*) with new information about that system (in the form of *observations*). DA is a central component of Numerical Weather Prediction where it is used to improve forecasting by adjusting the model parameters and reducing the

Imperial College London, London, UK

O. Lang (-) · W. Pan

e-mail: o.lang15@imperial.ac.uk; wei.pan@imperial.ac.uk

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*,

Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_10

uncertainties. To achieve this, a stochastic feedback loop between the model and the observation may be introduced: the assimilation of more data during the prediction interval will then decrease the uncertainty of the forecasts based on the initial data, by selecting the more likely paths as more observational data is collected. This is the basis of the so-called *ensemble data assimilation* which uses a set of model trajectories that are intermittently updated according to data.

A key step for ensuring the successful application of the combined stochastic parameterisation and data assimilation procedure, is the "correct" calibration of stochastic model parameters. For Stochastic Advection by Lie Transport (SALT) and Location Uncertainty (LU) models, current numerical methods for calibration, see [4], [1], [5], [12], have largely been inspired by the physical interpretation of the models derivations. More specifically on the assumption that the flow map is decoupled into a slow scale mean part and a fast scale fluctuating part. In the references mentioned before, it was shown that these methods are effective and led to successful combination of data driven models and state of the art data assimilation techniques.

In this work, we wish to investigate the feasibility and viability of probabilistic pathwise approach for calibration. Our general aim is to explore such ideas for a wide class of nonlinear stochastic transport models. This will be very useful in data assimilation problems, as in real world applications the signal is usually observed through discrete observations, but no results of this type for SALT or LU models have been obtained before. Currently, Lagrangian particle trajectories are simulated starting from each point on both the physical grid and its refined version, then the differences between the particle positions are used to calibrate the noise. This is computationally expensive and not fully justified from a theoretical perspective. In the same spirit as [3] but with a more complicated noise term and without any smoothing effects of a Laplacian, we propose an approach which uses high-frequency in time and low-frequency in space observations of a single path of the solution, to rigorously infer properties of the stochastic parameters. The knowledge of the noise is crucial for determining the behaviour of the solution and for assessing to what degree the solution of the coarse resolution SPDE deviates from the solution of the fine resolution PDE in the model reduction procedure, so an optimal calibration of the noise parameters is relevant from both a theoretical and an applied perspective.

In this work we look at stochastic calibration for the two-dimensional incompressible Euler equation in vorticity form. This stochastic equation models the local rotation of a fluid flow in the presence of spatial uncertainties and it has been derived from fundamental principles in [6]. This equation is a key ingredient in modelling phenomena in oceanography and in order to ensure that it efficiently encodes the small-scale variability in the upper part of the ocean, one needs to specify the stochastic parameters based on real observations. One of the main issues in parameter estimation using real data is the fact that the model parameters do not map to observations in a unique way (*model identifiability* problem, see e.g. [2]). For this reason, we believe that a probabilistic approach is much more suitable.

The 2D Euler equation in the form derived in [6] and studied in [4], [5] and [8] is given by:

$$d\boldsymbol{\omega}\_{l} + \boldsymbol{u}\_{l} \cdot \nabla \boldsymbol{\omega}\_{l} dt + \sum\_{l=1}^{\infty} \boldsymbol{\xi}\_{l} \cdot \nabla \boldsymbol{\omega}\_{l} \diamond d\boldsymbol{W}\_{l} = 0 \tag{1}$$

where *<sup>u</sup>* <sup>=</sup> *(u*1*, u*2*)* is the fluid velocity, *<sup>ω</sup>* <sup>=</sup> *curl u* <sup>=</sup> *<sup>∂</sup>*2*u*<sup>1</sup> <sup>−</sup> *<sup>∂</sup>*1*u*<sup>2</sup> is the vorticity, *(ξi)i* are divergence-free time-independent vector fields such that

$$\sum\_{l=1}^{\infty} \left\| \xi\_l \right\|\_{k+1,\infty}^2 < \infty \tag{2}$$

and *(W<sup>i</sup> )i*∈<sup>N</sup> is a sequence of independent Brownian motions. Global wellposedness for Eq. (1) has been proven in [8] and the numerical and data assimilation perspective has been studied in [4] and [5]. In [8] the authors have shown that Eq. (1) admits a unique pathwise solution which belongs to the Sobolev space <sup>W</sup>*k,*2*(*T2*) (k* <sup>≥</sup> <sup>2</sup>*)* when *<sup>ω</sup>*<sup>0</sup> <sup>∈</sup> <sup>W</sup>*k,*2*(*T2*)* and which can be extended to *<sup>L</sup>*∞*(*T2*)* when *<sup>ω</sup>*<sup>0</sup> <sup>∈</sup> *<sup>L</sup>*∞*(*T2*)*.

In this paper we consider the following SPDE on the two-dimensional torus <sup>T</sup><sup>2</sup> <sup>=</sup> R2*/*Z2, driven by a 1-dimensional Brownian motion *W*:

$$
\rho \, d\rho\_l + u\_l \cdot \nabla \alpha\_l dt + \xi \cdot \nabla \alpha\_l \diamond d\, W\_l = 0 \tag{3}
$$

where *u* and *ω* are as above and ◦ denotes Stratonovich integration. We impose the following condition on the stochastic parameter *ξ* , in the same spirit as (2):

$$\left\|\xi\right\|\_{k+1,\infty}^2 < \infty \tag{4}$$

with *k >* 4. This condition ensures that for any *<sup>f</sup>* <sup>∈</sup> <sup>W</sup>2*,*2*(*T2*)* <sup>∩</sup> <sup>W</sup>2*,*∞*(*T2*)*,

$$\left\|\|\boldsymbol{\xi}\cdot\boldsymbol{\nabla}\boldsymbol{f}\right\|\_{2}^{2}\leq\boldsymbol{C}\left\|\boldsymbol{f}\right\|\_{1,2}^{2}\qquad\left\|\boldsymbol{\xi}\cdot\boldsymbol{\nabla}(\boldsymbol{\xi}\cdot\boldsymbol{\nabla}\boldsymbol{f})\right\|\_{2}^{2}\leq\boldsymbol{C}\left\|\boldsymbol{f}\right\|\_{2,2}^{2}\tag{5}$$

$$\left\|\|\xi \cdot \nabla f\|\|\_{\infty}^{2} \leq C \left\|f\right\|\|\_{1,\infty}^{2} \qquad \left\|\xi \cdot \nabla(\xi \cdot \nabla f)\right\|\|\_{\infty}^{2} \leq C \left\|f\right\|\|\_{2,\infty}^{2} .\tag{6}$$

*Remark 1* We can view the stochastic part as a space-time noise *(ξ , W )* where the spatial component is given by *ξ* and the time component is a standard Brownian motion. This perspective is many times useful in numerical applications where *(ξ* ◦ *dWt)* · ∇ is implemented as a random operator applied to the solution *ω*.

The problem of parameter estimation, known also as *statistical inference*, is technically challenging for such (infinite-dimensional) SPDEs driven by transport noise, as most methods used in the literature benefit from a diagonalizable structure of the underlying space-covariance matrices. This structure is specific for additive noise and therefore it does not apply in our case. Also, most results are obtained for stochastic variations of the heat equation, which contain a smoothing Laplace operator (see for instance [3]). Our model does not contain a Laplacian a priori, and therefore we cannot exploit the properties of a heat kernel. These makes the analysis much harder.

#### **Contributions of the Paper**

In this work, we focus on Eq. (3) from two perspectives:


#### **Structure of the Paper**

In Sect. 2 below we present the problem formulation. In Sect. 3 we introduce the methodology. In Sect. 4 we prove the robustness of the original model and in Sect. 5 we present the numerical results.

#### **2 Problem Formulation**

Let *(,* <sup>F</sup>*, (*F*t)t*≥0*,* <sup>P</sup>*)* be a filtered probability space and *<sup>W</sup>* a one-dimensional Brownian motion adapted to the complete and right-continuous filtration *(*F*t)t*≥0.

Let *<sup>h</sup>* : <sup>R</sup> <sup>→</sup> <sup>R</sup> be a smooth function representing some observation map. We assume we have available a finite sequence of *high frequency* in time snapshots of observed vorticity fields, that are denoted by *h(ω*∗*)ti(x)* := *h(ω*<sup>∗</sup> *ti )(x)*, *i* = 1*,...,N*, and are adapted to *(*F*t)t*≥0. We take the view that the *h(ω*∗*)ti*'s are the given observation *data*. We further assume that *ω*∗ *ti* <sup>∈</sup> <sup>W</sup>*k,*2*(*T2*), k >* 4.

Writing *ωξ* to denote solutions to the model (3) for a given vector field *ξ* , the generic problem we are interested in is to find a *ξ* so that solutions to (3) matches the data as best as possible, i.e.

$$\arg\min\_{\xi} \|\boldsymbol{\alpha}^\* - \boldsymbol{\alpha}\_{\xi}\|\tag{7}$$

for some suitable norm.1

The dimension of the observations currently coincides with the number of sources of noise, that is we have a *determined* system. However, in practice this is not always a realistic assumption and in future work we will look at *underdetermined* or *overcomplete* systems i.e. when the number of noise sources is larger than the dimension of the observation operator.

In general, the infinite dimensional optimisation problem (7) may be too hard to solve in practice. We thus make concrete the form of *ξ* . Let *(*e*<sup>j</sup> )j*∈<sup>N</sup> be an orthonormal basis in *L*2*(*T2*)*. We assume the following parametric form for the stream function of *ξ* , which is henceforth denoted by *ζ* ,

$$\zeta(\mathbf{x}) = \sum\_{j=1}^{\infty} \alpha\_j \mathbf{e}\_j,\tag{8}$$

where *αj* are reals. Then

$$\xi(\mathbf{x}) = \nabla^{\perp} \xi(\mathbf{x}) = \sum\_{j=1}^{\infty} \alpha\_j \nabla^{\perp} \mathfrak{e}\_j(\mathbf{x}) \tag{9}$$

and the optimisation problem (7) then reduces to finding the coefficients *αj* .

#### **3 Methodology**

For a stochastic process *Xt* defined on a filtered probability space, its *quadratic variation* is defined by

$$\mathbb{I}[X]\_I := \lim\_{\max\_f \Delta t\_f \to 0} \sum\_{i=1}^n |X\_{I\_i} - X\_{I\_{i-1}}|^2,\tag{10}$$

where *t*<sup>0</sup> = 0 *< t*<sup>1</sup> *<* ··· *< tn* = *t* is a partition of the interval [0*, t*], *'ti* := |*ti* − *ti*−1|, and the convergence is in the sense of probability (see e.g. [7]).

From (3) and (9) we have

$$\omega\_l(\mathbf{x}) = a\mathbf{u}(\mathbf{x}) - \int\_0^t B\_s(\mathbf{x}; \boldsymbol{\omega}) \, ds - \int\_0^t \sum\_j \alpha\_j \nabla^\perp \mathbf{c}\_j(\mathbf{x}) \cdot \nabla \omega\_l(\mathbf{x}) \diamond dW\_l \tag{11}$$

in which for notation simplicity, we have introduced *Bs(x*; *ω)* := **u***s(x)* · ∇*ωs(x)*.

<sup>1</sup> By the assumed regularity of *<sup>h</sup>*, any solution to (7) is also a solution to arg min*<sup>ξ</sup> h(ω*∗*)*−*h(ωξ )*.

Using Itô's lemma, and following standard results on the quadratic variation of semimartingales, it is straightforward to show that

$$\mathbb{E}\left[h(\omega)\right]\_l = \sum\_{l,j=1}^{\infty} \alpha\_l \alpha\_j \int\_0^l \left< h'(\omega\_s), \ \nabla^{\perp} \mathfrak{e}\_l \cdot \nabla \omega\_s \rangle \left< h'(\omega\_s), \ \nabla^{\perp} \mathfrak{e}\_j \cdot \nabla \omega\_s \right> \, ds. \tag{12}$$

Due to global existence and uniqueness of solutions to (3), [*h(ω)*]*<sup>t</sup>* exists globally P-almost surely. Thus the right hand side of (23) can be arbitrarily well approximated by its truncation for all *t* i.e. for a given  *>* 0, there exists *M* such that

$$\left| \left( [h(\boldsymbol{\omega})]\_{t} - \sum\_{i,j=1}^{M\_{t}} \alpha\_{i} \boldsymbol{\alpha}\_{j} \int\_{0}^{t} \langle h'(\boldsymbol{\omega}\_{s}), \, \nabla^{\perp} \mathfrak{e}\_{i} \cdot \nabla \boldsymbol{\omega}\_{s} \rangle \, \langle h'(\boldsymbol{\omega}\_{s}), \, \nabla^{\perp} \mathfrak{e}\_{j} \cdot \nabla \boldsymbol{\omega}\_{s} \rangle \, ds \right| < \epsilon. \tag{13}$$

Additionally, from the computational perspective, for any fixed *M* , the linear map

$$\mathbf{A}\_{lj} := \int\_0^l \langle h'(\boldsymbol{\omega}\_3), \,\nabla^{\perp} \mathbf{c}\_l \cdot \nabla \boldsymbol{\omega}\_3 \rangle \, \langle h'(\boldsymbol{\omega}\_3), \,\nabla^{\perp} \mathbf{c}\_j \cdot \nabla \boldsymbol{\omega}\_3 \rangle \, \mathrm{d}s \tag{14}$$

that defines the truncated quadratic form is symmetric and positive definite,2 and thus can be diagonalised by a unitary linear map. Doing so, we obtain the following linear problem

$$[h(\omega)]\_l = \sum\_{j=1}^{M\_\epsilon} \tilde{\alpha}\_j^2 \lambda\_j + \epsilon',\tag{15}$$

where denotes the truncation error of (23), *λj* are the eigenvalues of the associated linear map, and *α*˜ *<sup>j</sup>* 's are the original *α* values which get rescaled by the unitary matrix from the diagonalisation.

We can estimate [*h(ω)*]*<sup>t</sup>* using the high frequency in time data *h(ω*∗*)* and (10), assuming the discrete sum converges fast enough,

$$[h(\omega)]\_l \approx \widehat{[h(\omega)]}\_{l,N} := \sum\_{l=1}^{N} |h(\omega^\*)\_{l\_l} - h(\omega^\*)\_{l\_{l-1}}|^2. \tag{16}$$

The estimate [ *h(ω)*]*t,N* could then be used in (15) to get an estimate for the *α*˜. One could then recover the original *α*'s by applying the unitary linear map that's associated with the diagonalisation of **A***ij* .

<sup>2</sup> Since [*h(ω)*]*<sup>t</sup>* is strictly positive.

*Example 1* Let *<sup>h</sup>* be the identity map. Let <sup>e</sup>*<sup>κ</sup>* <sup>=</sup> *<sup>e</sup>iκ*·*<sup>x</sup>* be the Fourier basis. Then we have

$$\widehat{[\boldsymbol{\alpha}]}\_{l,N} = \sum\_{\substack{l,j=1\\ \text{with } \boldsymbol{\kappa}\_{l}, \boldsymbol{\kappa}\_{j} \in \mathbb{Z}^{2}}}^{\infty} \alpha\_{l} \boldsymbol{\alpha}\_{j} \int\_{0}^{l} \mathsf{e}\_{\boldsymbol{\kappa}\_{l}} \mathsf{e}\_{\boldsymbol{\kappa}\_{j}} (\boldsymbol{\kappa}\_{l}^{\perp} \cdot \nabla \boldsymbol{\omega}\_{l}) \left(\boldsymbol{\kappa}\_{j}^{\perp} \cdot \nabla \boldsymbol{\omega}\_{l}\right) ds. \tag{17}$$

In Sect. 5 we test numerically Eq. (17) for an idealised example, and show we can adequately recover the basis coefficients using our methodology.

*Example 2* In this example, we assume the data are the kinetic energy of the flow,

$$\mathcal{E}\_{\mathbf{f}} := \frac{1}{2} \int\_{\mathbb{T}^2} |\mathbf{u}\_{\mathbf{f}}|^2 d\mathbf{x}. \tag{18}$$

Thus the data are "indirect" information about the vorticity. Note that the energy data is feasible for SALT models as energy is not a conserved quantity of SALT.

Below, we avoid calculating the pressure term of the Euler system by utilising the Biot-Savart operator *K* that links the velocity field to the vorticity field in Eq. (3). For further discussions on this topic see [9] or [10]. We have

$$\mathbf{u}(\mathbf{x}) = (K \star \omega)(\mathbf{x}) = \int\_{\mathbb{T}^2} K(\mathbf{x} - \mathbf{y}) \omega(\mathbf{y}) d\mathbf{y} \tag{19}$$

where

$$K(\mathbf{x}) = \sum\_{\boldsymbol{\kappa} \in \mathbb{Z}^2 \backslash \{0\}} \frac{i\boldsymbol{\kappa}^\perp}{\|\boldsymbol{\kappa}\|^2} e^{i\boldsymbol{\kappa}\cdot\mathbf{x}}.\tag{20}$$

It is known that, for any *k* ≥ 0, there exists a constant *Ck,*2, that is independent of **u**, and such that

$$\|\mathbf{u}\|\_{k+1,2} \le C\_{k,2} \|a\|\_{k,2}.$$

If *<sup>ψ</sup>* : <sup>T</sup><sup>2</sup> × [0*,*∞*)* <sup>→</sup> <sup>R</sup> is a solution for *'ψ* = −*<sup>ω</sup>* then **<sup>u</sup>** = ∇⊥*<sup>ψ</sup>* solves *<sup>ω</sup>* <sup>=</sup> curl **<sup>u</sup>**, so **<sup>u</sup>** = −∇⊥*'*−1*ω*. The reconstruction of **<sup>u</sup>** from *<sup>ω</sup>* is ensured by the incompressibility condition ∇ · **u** = 0 and a periodic, distributional solution of *'ψ* = −*ω* is given by

$$
\psi(\mathbf{x}) = (G \star w)(\mathbf{x}).
$$

where *<sup>G</sup>* is the Green's function of the operator <sup>−</sup>*'* on <sup>T</sup><sup>2</sup>

$$G(\mathbf{x}) = \sum\_{\kappa \in \mathbb{Z}^2 \backslash \{0\}} \frac{e^{i\kappa \cdot \mathbf{x}}}{\|\kappa\|^2}$$

and *κ* = *(κ*1*, κ*2*)*, *κ*<sup>⊥</sup> = *(κ*2*,* −*κ*1*)*.

Combining (11) with the Biot-Savart law (19) we obtain

$$\mathbf{u}\_t(\mathbf{x}) = \mathbf{u}\_0(\mathbf{x}) - \int\_0^t \int\_{\mathbb{T}^2} K(\mathbf{x} - \mathbf{y}) B\_s(\mathbf{y}; \boldsymbol{\omega}) d\mathbf{y} ds - \int\_0^t \int\_{\mathbb{T}^2} K(\mathbf{x} - \mathbf{y}) \xi(\mathbf{y}) \cdot \nabla \boldsymbol{\omega}\_3(\mathbf{y}) d\mathbf{y} \diamond dW\_s \tag{21}$$

Using Itô's lemma, we obtain

$$\mathbf{E}\_t - \mathbf{E}\_0 = -\int\_0^t \langle \mathbf{u}\_s, K \star (B\_s - \frac{1}{2}\xi \cdot \nabla(\xi \cdot \nabla \omega\_s)) \rangle ds - \int\_0^t \langle \mathbf{u}\_s, K \star (\xi \cdot \nabla \omega\_s) \rangle \, dW\_s \tag{22}$$

where ·*,* · is the standard *<sup>L</sup>*2*(*T2*)* pairing. Thus

$$\begin{aligned} \text{[E]}\_{l} = \sum\_{l,j=1}^{\infty} \alpha\_{l} \alpha\_{j} \int\_{0}^{l} \langle \mathbf{u}\_{s}, K \star (\nabla^{\perp} \mathbf{c}\_{j} \cdot \nabla \omega\_{s}) \rangle \langle \mathbf{u}\_{s}, K \star (\nabla^{\perp} \mathbf{c}\_{l} \cdot \nabla \omega\_{s}) \rangle \, ds. \end{aligned} \tag{23}$$

#### **4 Robustness**

**Theorem 2** *Let ω*1*, ω*<sup>2</sup> *be two solutions of the 2D Euler equation* (3) *and ξ* <sup>1</sup>*, ξ* <sup>2</sup> *the corresponding stochastic parameters for each of these two solutions. More precisely, (ω!, ξ !) for !* <sup>=</sup> <sup>1</sup>*,* <sup>2</sup> *solves*

$$d\alpha\_t^\ell + u\_t^\ell \cdot \nabla \alpha\_t^\ell dt + \xi^\ell \cdot \nabla \alpha\_t^\ell dW\_t = \frac{1}{2} \xi^\ell \cdot \nabla \left(\xi^\ell \cdot \nabla \alpha\_t^\ell\right). \tag{24}$$

*Then for any <sup>p</sup>* <sup>≥</sup> <sup>2</sup> *there exist some constants*<sup>3</sup> *<sup>C</sup>* <sup>=</sup> *C(p, T ), C*1*,p, C*2*,p, such that*

$$\mathbb{E}\left[\sup\_{t\in[0,T]}\left\|\boldsymbol{\omega}\_{l}^{-\mathcal{V}(t)}\|\boldsymbol{\omega}\_{l}^{1}-\boldsymbol{\omega}\_{l}^{2}\right\|\_{2}^{2p}\right] \leq C\_{p,T}\left(\left\|\boldsymbol{\omega}\_{0}^{1}-\boldsymbol{\omega}\_{0}^{2}\right\|\_{2}^{2p}+\left\|\boldsymbol{\xi}^{1}-\boldsymbol{\xi}^{2}\right\|\_{2}^{2p}+\left\|\boldsymbol{\xi}^{1}-\boldsymbol{\xi}^{2}\right\|\_{1,2}^{2p}\right) \tag{25}$$

*where*

<sup>3</sup> In this theorem all constants generically denoted by *C,Cp,T , C*1*,p, C*2*,p,C*˜ may differ from line to line and from term to term.

A Pathwise Parameterisation for Stochastic Transport 167

$$\mathcal{Y}(t) := C\_{1,p} \int\_0^t \|\boldsymbol{\alpha}\_r^1\|\_{k,2}^2 dr + C\_{2,p}t$$

*and k >* 4*.*

*Proof of Theorem <sup>2</sup>* Let *<sup>ω</sup>*¯ := *<sup>ω</sup>*<sup>1</sup> <sup>−</sup>*ω*2*, <sup>u</sup>*¯ <sup>=</sup> *<sup>u</sup>*<sup>1</sup> <sup>−</sup>*u*2*, <sup>ξ</sup>*¯ <sup>=</sup> *<sup>ξ</sup>* <sup>1</sup> <sup>−</sup>*<sup>ξ</sup>* 2. Then *<sup>ω</sup>*¯ satisfies

$$d\bar{\boldsymbol{\omega}}\_{l} + (\bar{\boldsymbol{u}}\_{l} \cdot \nabla \boldsymbol{\omega}\_{l}^{1} + \boldsymbol{u}\_{l}^{2} \cdot \nabla \bar{\boldsymbol{\omega}}\_{l})dt + \left(\boldsymbol{\xi}^{1} \cdot \nabla \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\xi}^{2} \cdot \nabla \boldsymbol{\omega}\_{l}^{2}\right)dW\_{l}$$

$$=\frac{1}{2}\left(\boldsymbol{\xi}^{1} \cdot \nabla(\boldsymbol{\xi}^{1} \cdot \nabla \boldsymbol{\omega}\_{l}^{1}) - \boldsymbol{\xi}^{2} \cdot \nabla(\boldsymbol{\xi}^{2} \cdot \nabla \boldsymbol{\omega}\_{l}^{2})\right)dt.$$

By the Itô formula:

$$d\|\bar{\omega}\_{l}\|\_{2}^{2} = -2\langle\bar{\omega}\_{l}, \xi^{1} \cdot \nabla\boldsymbol{\omega}\_{l}^{1} - \xi^{2} \cdot \nabla\boldsymbol{\omega}\_{l}^{2}\rangle dW\_{l} - 2\langle\bar{\omega}\_{l}, \bar{\boldsymbol{u}}\_{l} \cdot \nabla\boldsymbol{\omega}\_{l}^{1} + \boldsymbol{u}\_{l}^{2} \cdot \nabla\bar{\boldsymbol{\omega}}\_{l}\rangle dt$$

$$+ \left( \langle\bar{\boldsymbol{\omega}}\_{l}, \xi^{1} \cdot \nabla(\xi^{1} \cdot \nabla\boldsymbol{\omega}\_{l}^{1}) - \xi^{2} \cdot \nabla(\xi^{2} \cdot \nabla\boldsymbol{\omega}\_{l}^{2}) \rangle \right. \tag{26}$$

$$+ \langle\xi^{1} \cdot \nabla\boldsymbol{\omega}\_{l}^{1} - \xi^{2} \cdot \nabla\boldsymbol{\omega}\_{l}^{2}, \xi^{1} \cdot \nabla\boldsymbol{\omega}\_{l}^{1} - \xi^{2} \cdot \nabla\boldsymbol{\omega}\_{l}^{2} \rangle \right) dt.$$

We make the following notations

$$\begin{aligned} m\_{l} &:= \|\bar{\boldsymbol{\alpha}}\_{l}\|\_{2}^{2} & \qquad Z := \|\bar{\boldsymbol{\xi}}\|\_{2}^{2} & L := \|\bar{\boldsymbol{\xi}}\|\_{1,2}^{2} \\\\ A\_{l} &:= -2\langle\bar{\boldsymbol{\alpha}}\_{l}, \bar{\boldsymbol{u}}\_{l} \cdot \nabla\boldsymbol{\alpha}\_{l}^{1} + \boldsymbol{u}\_{l}^{2} \cdot \nabla\bar{\boldsymbol{\alpha}}\_{l}\rangle + \langle\bar{\boldsymbol{\alpha}}\_{l}, \boldsymbol{\xi}^{1} \cdot \nabla\{\xi^{1} \cdot \nabla\boldsymbol{\alpha}\_{l}^{1}\} - \boldsymbol{\xi}^{2} \cdot \nabla\{\xi^{2} \cdot \nabla\boldsymbol{\alpha}\_{l}^{2}\} \rangle \\ & \qquad + \langle\boldsymbol{\xi}^{1} \cdot \nabla\boldsymbol{\alpha}\_{l}^{1} - \boldsymbol{\xi}^{2} \cdot \nabla\boldsymbol{\alpha}\_{l}^{2}, \xi^{1} \cdot \nabla\boldsymbol{\alpha}\_{l}^{1} - \boldsymbol{\xi}^{2} \cdot \nabla\boldsymbol{\alpha}\_{l}^{2}\rangle \\\\ D\_{l} &:= \int\_{0}^{l} \langle\bar{\boldsymbol{\alpha}}\_{l}, \boldsymbol{\xi}^{1} \cdot \nabla\boldsymbol{\alpha}\_{s}^{1} - \boldsymbol{\xi}^{2} \cdot \nabla\boldsymbol{\alpha}\_{s}^{2}\rangle dW\_{s} \\ \phi(t) &:= C\|\boldsymbol{\alpha}\_{l}^{1}\|\_{k,2}^{2} + C \\ \psi(t) &:= (C\|\boldsymbol{\alpha}\_{l}^{1}\|\_{k,2}^{2} + C)Z + C\|\boldsymbol{\alpha}\_{l}^{1}\|\_{k,2}^{2}L \\ \tilde{Z} &:= C\|\boldsymbol{\alpha}\_{l}^{1}\|\_{k,2}^{2}Z \end{aligned}$$

Then we can write (26) as

$$dm\_l = A\_l dt - 2dD\_l$$

We want to estimate each of the terms which appear in (26). The difference of the nonlinear terms is analysed explicitly in [8] pp. 9:

$$\langle \tilde{\boldsymbol{\alpha}}\_{l}, \tilde{\boldsymbol{u}}\_{l} \cdot \nabla \boldsymbol{\alpha}\_{l}^{\mathsf{I}} \rangle \leq \|\tilde{\boldsymbol{\alpha}}\_{l}\|\_{2} \|\tilde{\boldsymbol{u}}\_{l}\|\_{4} \|\nabla \boldsymbol{\alpha}\_{l}^{\mathsf{I}}\|\_{4} \leq C \|\tilde{\boldsymbol{\alpha}}\_{l}\|\_{2}^{2} \|\boldsymbol{\alpha}\_{l}^{\mathsf{I}}\|\_{k,2} = C \|\boldsymbol{\alpha}\_{l}^{\mathsf{I}}\|\_{k,2}^{2} m\_{l}$$

We used here that ∇*ω*<sup>1</sup> *<sup>t</sup>* <sup>4</sup> <sup>≤</sup> *<sup>C</sup>ω*<sup>1</sup> *<sup>t</sup> k,*<sup>2</sup> and ¯*ut*<sup>4</sup> ≤ *C* ¯*ut*1*,*<sup>2</sup> ≤ *C* ¯*ωt*2. Also, since *<sup>u</sup>*<sup>2</sup> is divergence-free, ¯*ωt, u*<sup>2</sup> *<sup>t</sup>* ·∇¯*ωt*=−<sup>1</sup> 2 - T2 *(*∇ · *<sup>u</sup>*<sup>2</sup> *<sup>t</sup> )(ω*¯*t)* <sup>2</sup>*dx* <sup>=</sup> 0. We estimate the difference terms which include *ξ* <sup>1</sup> and *ξ* <sup>2</sup> in Lemma 3 below. Note here that the term ¯*ωt, ξ* <sup>2</sup> · ∇ *<sup>ξ</sup>* <sup>2</sup> ·∇¯*ωt*  is negative. Using these estimates and Lemma 3 below we have that

$$A\_l dt \le \psi(t) dt + \phi(t) m\_l dt.$$

Then

$$\begin{aligned} \left( \begin{array}{c} -\int\_{0}^{t} \phi(s) ds \\\\ \end{array} \right) &= e^{-\int\_{0}^{t} \phi(s) ds} \left( dm\_{l} - \phi(t) m\_{l} dt \right) \\\\ &\leq e^{-\int\_{0}^{t} \phi(s) ds} \left( \psi(t) dt - 2dD\_{l} \right). \end{aligned}$$

After raising everything to the power *<sup>p</sup>* <sup>≥</sup> 2,<sup>4</sup> taking the supremum over *<sup>t</sup>* ∈ [0*, T* ] and then the expectation, we obtain

$$\begin{split} \mathbb{E}\left[\sup\_{t\in[0,T]}\left(e^{-\int\_{0}^{t}\phi(s)ds} \, \_{m\_{t}}\right)^{p}\right] &\leq C\_{p}m\_{0}^{p} + C\_{p}\mathbb{E}\left[\sup\_{t\in[0,T]}\left|\int\_{0}^{t}e^{-\int\_{0}^{s}\phi(r)dr} \, \_{\psi(s)ds}\right|^{p}\right] \\ &+ C\_{p}\mathbb{E}\left[\sup\_{t\in[0,T]}\left|\int\_{0}^{t}e^{-\int\_{0}^{s}\phi(r)dr} \, \_{d}D\_{s}\right|^{p}\right] \end{split} \tag{27}$$

For the stochastic integral we use the Burkholder-Davis-Gundy inequality: for arbitrary *<sup>p</sup>* <sup>≥</sup> 2 and a martingale *Mt* there exists a constant *Cp* such that<sup>5</sup>

$$\mathbb{E}\left[\sup\_{t\in[0,T]}|M\_{I}|^{p}\right] \leq C\_{p}\mathbb{E}\left[\left[M\right]\_{T}^{p/2}\right]$$

where [*M*]*<sup>t</sup>* is the quadratic variation of the martingale *Mt* . In our case

<sup>4</sup> We use here and below that <sup>|</sup>*<sup>a</sup>* <sup>+</sup> *<sup>b</sup>*<sup>|</sup> *<sup>p</sup>* <sup>≤</sup> <sup>2</sup>*p*−1*(*|*a*<sup>|</sup> *<sup>p</sup>* + |*b*<sup>|</sup> *p)*, *<sup>p</sup>* <sup>≥</sup> 2.

<sup>5</sup> In this proof *C,Cp* are generic constants which may differ from line to line and from term to term.

A Pathwise Parameterisation for Stochastic Transport 169

$$M\_l := \int\_0^l e^{-\int\_0^s \phi(r) dr} \, dD\_s$$

and then

$$\delta[M]\_t = \int\_0^t e^{-2} \int\_0^s \phi(r) dr \Big|\_{s} = \int\_0^t e^{-2} \int\_0^s \phi(r) dr \Big|\_{\{\tilde{\alpha}\_s, \xi^1 \cdot \nabla \tilde{\alpha}\_s^1 - \xi^2 \cdot \nabla \tilde{\alpha}\_s^2\} } |\_{s}^2 ds.$$

Therefore6

$$\begin{split} \mathbb{E}\left[\left[M\right]\_{T}^{p/2}\right] &= \mathbb{E}\left[\left(\int\_{0}^{T}e^{-2\int\_{0}^{s}\phi\left(r\right)dr}\left|\left\langle\tilde{\boldsymbol{\omega}}\_{s},\boldsymbol{\xi}^{1}\cdot\nabla\boldsymbol{\omega}\_{s}^{1}-\boldsymbol{\xi}^{2}\cdot\nabla\boldsymbol{\omega}\_{s}^{2}\right\rangle\right|^{p/2}ds\right)^{p/2}\right] \\ &\leq C\_{p,T}\mathbb{E}\left[\int\_{0}^{T}e^{-p\int\_{0}^{s}\phi\left(r\right)dr}\left|\left\langle\tilde{\boldsymbol{\omega}}\_{s},\boldsymbol{\xi}^{1}\cdot\nabla\boldsymbol{\omega}\_{s}^{1}-\boldsymbol{\xi}^{2}\cdot\nabla\boldsymbol{\omega}\_{s}^{2}\right\rangle\right|^{p}ds\right] \\ &\leq C\_{p,T}\int\_{0}^{T}\mathbb{E}\left[\sup\_{r\in\{0,s\}}e^{-p\int\_{0}^{r}\phi\left(q\right)dq}\left(m\_{r}^{p}+\tilde{Z}^{p}\right)\right]ds. \end{split}$$

Using these estimates in (27) we obtain

$$\begin{split} \mathbb{E}\left[\sup\_{t\in[0,T]}\left(e^{-\int\_{0}^{t}\phi(s)ds}m\_{t}\right)^{p}\right] &\leq C\_{p}m\_{0}^{p} + C\_{p,T}\mathbb{E}\left[\sup\_{t\in[0,T]}\int\_{0}^{t}e^{-\int\_{0}^{s}\phi(r)dr}\psi(s)^{p}ds\right] \\ &+C\_{p,T}\int\_{0}^{T}\mathbb{E}\left[\sup\_{r\in[0,s]}e^{-p\int\_{0}^{r}\phi(q)dq}\left(m\_{r}^{p}+\bar{Z}^{p}\right)\right]ds \end{split} \tag{28}$$

For the second term on the right hand side of (28) we use that, since *Z* is deterministic and by [8] the 2D Euler equation (3) has a unique global solution in <sup>W</sup>*k,*2*(*T2*)* for *<sup>k</sup>* <sup>≥</sup> 2, there exist *<sup>C</sup>*˜ <sup>1</sup> *p,C*˜ <sup>2</sup> *<sup>p</sup>* such that for all *t* ∈ [0*, T* ]

<sup>6</sup> We use here the control obtained for *<sup>Q</sup>* in Lemma 3. More precisely: since *<sup>Q</sup>* <sup>≤</sup> *Cmt* <sup>+</sup> *<sup>Z</sup>*˜ then *Qp* <sup>≤</sup> *Cp(m<sup>p</sup> <sup>t</sup>* <sup>+</sup> *<sup>Z</sup>*˜ *p)*.

$$\begin{split} \mathbb{E}\left[\sup\_{s\in[0,t]} \psi(s)^{p}\right] &= \mathbb{E}\left[\sup\_{s\in[0,t]} \left( (C\|\boldsymbol{\omega}\_{s}^{1}\|\_{k,2}^{2} + C)Z + C\|\boldsymbol{\omega}\_{s}^{1}\|\_{k,2}^{2}L \right)^{p} \right] \\ &\leq Z^{p}\mathbb{E}\left[\sup\_{s\in[0,t]} (C\|\boldsymbol{\omega}\_{s}^{1}\|\_{k,2}^{2} + C)^{p}\right] + L^{p}\mathbb{E}\left[\sup\_{s\in[0,t]} (C\|\boldsymbol{\omega}\_{s}^{1}\|\_{k,2}^{2})^{p}\right] \\ &\leq \tilde{C}\_{p}^{1}Z^{p} + \tilde{C}\_{p}^{2}L^{p} .\end{split}$$

The same argument is used to control - *T* 0 E ⎡ ⎢ <sup>⎣</sup> sup *r*∈[0*,s*] *e* −*p r* 0 *φ(q)dq Z*˜ *p* ⎤ ⎥ ⎦ *ds* in the

third term of (28). Then

$$\begin{aligned} \mathbb{E}\left[\sup\_{t\in[0,T]}\left(e^{-\int\_0^t \phi(s)ds} \underset{m\_I}{\right)^p}\right] &\leq \mathbb{C}\_{p,T}^1(m\_0^p + Z^p + L^p) \\ &+ \mathbb{C}\_{p,T}^2 \int\_0^T \mathbb{E}\left[\sup\_{r\in[0,s]}\left(e^{-\int\_0^r \phi(q)dq} m\_r\right)^p\right]ds. \end{aligned}$$

Then by Gronwall lemma

$$\begin{aligned} \mathbb{E}\left[\sup\_{t\in[0,T]}\left(e^{-\int\_0^t \phi(s)ds} \bigg|\_{m\_l}\right)^p\right] &\leq e^{\int\_0^T C\_{p,T}^2 ds} \left(m\_0^p + \int\_0^T C\_{p,T}^1 (m\_0^p + Z^p + L^p) ds\right) \\ &\leq e^{C(T)} \left(m\_0^p + T(C\_{p,T}^1 (m\_0^p + Z^p + L^p))\right). \end{aligned}$$

So we finally obtain that

$$\begin{aligned} &\mathbb{E}\left[\sup\_{t\in[0,T]} e^{-\gamma(t)} \|\boldsymbol{\alpha}\_{t}^{1}-\boldsymbol{\alpha}\_{t}^{2}\|\_{2}^{2p}\right] \\ &\leq C\_{p,T} \left( \|\boldsymbol{\alpha}\_{0}^{1}-\boldsymbol{\alpha}\_{0}^{2}\|\_{2}^{2p} + \|\boldsymbol{\xi}^{1}-\boldsymbol{\xi}^{2}\|\_{2}^{2p} + \|\boldsymbol{\xi}^{1}-\boldsymbol{\xi}^{2}\|\_{1,2}^{2p} \right), \quad p \geq 2 \end{aligned}$$

where

$$\nu(t) := p \int\_0^t \phi(r) dr.$$

**Lemma 3** *Let (ω*<sup>1</sup> *<sup>t</sup> , ξ* <sup>1</sup>*) and (ω*<sup>2</sup> *<sup>t</sup> , ξ* <sup>2</sup>*) be two solutions of the 2D Euler equation with <sup>ω</sup>*¯*<sup>t</sup>* := *<sup>ω</sup>*<sup>1</sup> *<sup>t</sup>* <sup>−</sup> *<sup>ω</sup>*<sup>2</sup> *<sup>t</sup> and <sup>ξ</sup>*¯ := *<sup>ξ</sup>* <sup>1</sup> <sup>−</sup> *<sup>ξ</sup>* <sup>2</sup>*. Then there exist constants <sup>C</sup>*<sup>7</sup> *such that the following estimates hold:*

$$\mathcal{Q} := |\langle \tilde{\omega}\_l, \xi^1 \cdot \nabla \omega\_l^1 - \xi^2 \cdot \nabla \omega\_l^2 \rangle| \le C \|\tilde{\omega}\_l\|\_2^2 + C \|\omega\_l^1\|\_{k,2}^2 \|\tilde{\xi}\|\_2^2.$$

$$A := \langle \xi^1 \cdot \nabla \omega\_l^1 - \xi^2 \cdot \nabla \omega\_l^2, \xi^1 \cdot \nabla \omega\_l^1 - \xi^2 \cdot \nabla \omega\_l^2 \rangle \le C \|\tilde{\omega}\_l\|\_2^2 + C \|\omega\_l^1\|\_{k,2}^2 \|\tilde{\xi}\|\_2^2.$$

$$|B| \le (C \|\omega\_l^1\|\_{k,2}^2 + C) \|\tilde{\omega}\_l\|\_2^2 + C \|\tilde{\xi}\|\_2^2 + C \|\omega\_l^1\|\_{k,2}^2 \|\tilde{\xi}\|\_{1,2}^2$$

where

$$B := \langle \tilde{\boldsymbol{\alpha}}\_l, \xi^1 \cdot \nabla \left( \xi^1 \cdot \nabla \boldsymbol{\alpha}\_l^1 \right) - \xi^2 \cdot \nabla \left( \xi^2 \cdot \nabla \boldsymbol{\alpha}\_l^2 \right) \rangle .$$

and *k >* 4. *Proof* For the difference terms which include *ξ* <sup>1</sup> and *ξ* <sup>2</sup> we use that

$$
\xi^1 \cdot \nabla \boldsymbol{\alpha}\_t^1 - \xi^2 \cdot \nabla \boldsymbol{\alpha}\_t^2 = \bar{\xi} \cdot \nabla \boldsymbol{\alpha}\_t^1 + \xi^2 \cdot \nabla \bar{\boldsymbol{\alpha}}\_t.
$$

We have

$$\begin{split} \mathcal{Q} &= |\langle \bar{\omega}\_{l}, \boldsymbol{\xi}^{1} \cdot \nabla \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\xi}^{2} \cdot \nabla \boldsymbol{\omega}\_{l}^{2} \rangle| \\ &\leq |\langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, (\boldsymbol{\xi}^{1} - \boldsymbol{\xi}^{2}) \cdot \nabla \boldsymbol{\omega}\_{l}^{1} \rangle| + |\langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, \boldsymbol{\xi}^{2} \cdot \nabla (\boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}) \rangle| \\ &\leq \frac{1}{2} \|\boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2} \|\_{2}^{2} + \frac{1}{2} \|\nabla \boldsymbol{\omega}\_{l}^{1} \|\_{\infty}^{2} \|\boldsymbol{\xi}^{1} - \boldsymbol{\xi}^{2} \|\_{2}^{2} \\ &\leq \frac{1}{2} \|\boldsymbol{\bar{\omega}}\_{l} \|\_{2}^{2} + \frac{C}{2} \|\boldsymbol{\omega}\_{l}^{1} \|\_{k,2}^{2} \|\boldsymbol{\bar{\xi}} \|\_{2}^{2} \end{split}$$

with *<sup>k</sup>* <sup>≥</sup> 3, since the second scalar product is zero due to the fact that ∇ · *<sup>ξ</sup>* <sup>2</sup> <sup>=</sup> <sup>0</sup>*.* Also

$$\begin{split} A &= \langle \xi^1 \cdot \nabla \boldsymbol{\omega}\_l^1 - \xi^2 \cdot \nabla \boldsymbol{\omega}\_l^2, \xi^1 \cdot \nabla \boldsymbol{\omega}\_l^1 - \xi^2 \cdot \nabla \boldsymbol{\omega}\_l^2 \rangle = \|\xi^1 \cdot \nabla \boldsymbol{\omega}\_l^1 - \xi^2 \cdot \nabla \boldsymbol{\omega}\_l^2\|\_2^2 \\ &\le \| (\xi^1 - \xi^2) \cdot \nabla \boldsymbol{\omega}\_l^1 \|\_2^2 + \|\xi^2 \cdot \nabla (\boldsymbol{\omega}\_l^1 - \boldsymbol{\omega}\_l^2) \|\_2^2 \\ &\le \|\xi^1 - \xi^2\|\_2^2 \|\nabla \boldsymbol{\omega}\_l^1\|\_\infty^2 + C \|\boldsymbol{\omega}\_l^1 - \boldsymbol{\omega}\_l^2\|\_2^2 \\ &\le C \|\boldsymbol{\omega}\_l^1\|\_{k,2}^2 \|\bar{\xi}\_l\|\_2^2 + C \|\bar{\boldsymbol{\omega}}\_l\|\_2^2 \end{split}$$

where *k* ≥ 3. For the higher order term we have

<sup>7</sup> *C* differs from line to line and from term to term depending on the Sobolev embedding we use.

$$\begin{split} B &= \langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, \boldsymbol{\xi}^{1} \cdot \nabla \left( \boldsymbol{\xi}^{1} \cdot \nabla \boldsymbol{\omega}\_{l}^{1} \right) - \boldsymbol{\xi}^{2} \cdot \nabla \left( \boldsymbol{\xi}^{2} \cdot \nabla \boldsymbol{\omega}\_{l}^{2} \right) \rangle \\ &= \langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, \left( \boldsymbol{\xi}^{1} - \boldsymbol{\xi}^{2} \right) \cdot \nabla \left( \boldsymbol{\xi}^{1} \cdot \nabla \boldsymbol{\omega}\_{l}^{1} \right) \rangle \\ &+ \langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, \boldsymbol{\xi}^{2} \cdot \nabla \left( \left( \boldsymbol{\xi}^{1} - \boldsymbol{\xi}^{2} \right) \cdot \nabla \boldsymbol{\omega}\_{l}^{1} \right) \rangle \\ &+ \langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, \boldsymbol{\xi}^{2} \cdot \nabla \left( \boldsymbol{\xi}^{2} \cdot \nabla \left( \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2} \right) \right) \rangle \\ &=: a + b + c. \end{split}$$

Note that *c* is negative:

$$\begin{aligned} \langle \boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}, \boldsymbol{\xi}^{2} \cdot \nabla \left( \boldsymbol{\xi}^{2} \cdot \nabla (\boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}) \right) \rangle &= - \langle \boldsymbol{\xi}^{2} \cdot \nabla (\boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}), \boldsymbol{\xi}^{2} \cdot \nabla (\boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}) \rangle \\ &= - \| \boldsymbol{\xi}^{2} \cdot \nabla (\boldsymbol{\omega}\_{l}^{1} - \boldsymbol{\omega}\_{l}^{2}) \|\_{2}^{2} \\ &\leq 0 \end{aligned}$$

so |*B*|≤|*a*|+|*b*|. We estimate |*a*| as follows:

$$|a| = |\langle \omega\_t^1 - \omega\_t^2, (\xi^1 - \xi^2) \cdot \nabla(\xi^1 \cdot \nabla \omega\_t^1) \rangle| \le \frac{1}{2} \|\nabla(\xi^1 \cdot \nabla \omega\_t^1)\|\_{\infty}^2 \|\omega\_t^1 - \omega\_t^2\|\_2^2 + \frac{1}{2} \|\xi^1 - \xi^2\|\_2^2$$

$$\le \frac{C}{2} \|\omega\_t^1\|\_{2,\infty}^2 \|\omega\_t^1 - \omega\_t^2\|\_2^2 + \frac{1}{2} \|\xi^1 - \xi^2\|\_2^2$$

$$\le \frac{C}{2} \|\omega\_t^1\|\_{k,2}^2 \|\bar{\omega}\_t\|\_2^2 + \frac{1}{2} \|\bar{\xi}\|\_2^2$$

with *k >* 4. Likewise, we estimate |*b*|:

$$|b| = |\langle \boldsymbol{\alpha}\_t^1 - \boldsymbol{\alpha}\_t^2, \boldsymbol{\xi}^2 \cdot \nabla \left( (\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2) \cdot \nabla \boldsymbol{\alpha}\_t^1 \right) | \, \le \frac{1}{2} \|\boldsymbol{\alpha}\_t^1 - \boldsymbol{\alpha}\_t^2\|\_2^2 + \frac{1}{2} \|\boldsymbol{\xi}^2 \cdot \nabla \left( (\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2) \cdot \nabla \boldsymbol{\alpha}\_t^1 \right) \|\_2^2$$

$$=: \frac{1}{2} \|\boldsymbol{\alpha}\_t^1 - \boldsymbol{\alpha}\_t^2\|\_2^2 + \frac{1}{2} \mathcal{K}.$$

Now

$$\mathcal{K} \le \|\boldsymbol{\xi}^2 \cdot \nabla(\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2) \cdot \nabla \boldsymbol{\omega}\_l^1\|\_2^2 + \|\boldsymbol{\xi}^2 \cdot (\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2) \cdot \nabla(\nabla \boldsymbol{\omega}\_l^1)\|\_2^2 := \mathcal{K}\_1 + \mathcal{K}\_2$$

where

$$\begin{aligned} \mathcal{K}\_1 &\leq \|\xi^2 \cdot \nabla \boldsymbol{\omega}\_t^1\|\_{\infty}^2 \|\nabla(\xi^1 - \xi^2)\|\_2^2 \\ &\leq C \|\boldsymbol{\omega}\_t^1\|\_{1,\infty}^2 \|\xi^1 - \xi^2\|\_{1,2}^2 \\ &\leq C \|\boldsymbol{\omega}\_t^1\|\_{k,2}^2 \|\xi^1 - \xi^2\|\_{1,2}^2 \end{aligned}$$

and

$$\begin{aligned} \mathcal{K}\_2 &\leq C \|\boldsymbol{\xi}^2 \cdot \nabla(\nabla \boldsymbol{\omega}\_l^1)\|\_4^2 \|\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2\|\_4^2 \\ &\leq C \|\boldsymbol{\xi}^2 \cdot \nabla(\nabla \boldsymbol{\omega}\_l^1)\|\_{1,2}^2 \|\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2\|\_{1,2}^2 \\ &\leq C \|\boldsymbol{\alpha}\_l^1\|\_{k,2}^2 \|\boldsymbol{\xi}^1 - \boldsymbol{\xi}^2\|\_{1,2}^2 \end{aligned}$$

for *k >* 4. Then

$$\mathcal{K} \le 2C \|a\_1^1\|\_{k,2}^2 \|\bar{\xi}\|\_{1,2}^2$$

and therefore

$$|b| \le \frac{1}{2} \|\bar{a}\_l\|\_2^2 + C \|a\_l^1\|\_{k,2}^2 \|\bar{\xi}\|\_{1,2}^2$$

which gives

$$\|\mathcal{B}\| \le \left(\frac{C}{2} \|\boldsymbol{\omega}\_l^1\|\_{k,2}^2 + \frac{1}{2}\right) \|\bar{\boldsymbol{\sigma}}\_l\|\_2^2 + \frac{1}{2} \|\bar{\boldsymbol{\xi}}\|\_2^2 + C \|\boldsymbol{\omega}\_l^1\|\_{k,2}^2 \|\bar{\boldsymbol{\xi}}\|\_{1,2}^2.$$

#### **5 Numerical Results**

In this section, we show the results we obtained for Example 1 in Sect. 3. We implemented the main equation (3) with added forcing and damping, on a unit square domain with doubly periodic boundary conditions,

$$d\alpha\_l + \mathbf{u}\_l \cdot \nabla \alpha\_l dt + \xi \cdot \nabla \alpha\_l \diamond dW\_l = (\mathcal{Q} - r\alpha\_l)dt\tag{29}$$

where we chose *r* = 0*.*001 and *Q(x)* = 0*.*01*(*cos*(*8*πy)* + sin*(*8*π x))*. Note that, since the added forcing term is of bounded variation, (17) is unchanged for (29).

We considered a *ξ* whose parametric form with respect to the Fourier basis consists of only one *α*. The stream function of our chosen *ξ* is given by

$$\xi(\mathbf{x}, \mathbf{y}) = \alpha \left( \cos(k\_1 2\pi \mathbf{x}) \cos(k\_2 2\pi \mathbf{y}) - \sin(k\_1 2\pi \mathbf{x}) \sin(k\_2 2\pi \mathbf{y}) \right). \tag{30}$$

Note that

$$\xi = \frac{\alpha}{2} (e^{i2\pi k \cdot \mathbf{x}} + e^{-i2\pi k \cdot \mathbf{x}}),\tag{31}$$

and

$$\xi = i\alpha \pi (e^{i2\pi k \cdot \chi} - e^{-i2\pi k \cdot \chi}) k^{\perp}. \tag{32}$$

**Fig. 1** Snapshots of the numerical solution *ω(t, x)* to (29) at times *t* = 0 (left), and *t* = 1 (right)

To discretise (29), we followed the methods documented in [4]—a mixed Finite Element method was used for the spatial derivatives, and an explicit strong stability preserving Runge-Kutta scheme of order 3 was used for the time derivative. We added the forcing and damping terms to help with maintaining the statistical homogeneity of the numerical solution, once it has reached a spun-up state from some set initial state. Our choice for the set initial state was

$$\begin{aligned} \rho(0, \mathbf{x}, \mathbf{y}) &= \sin(8\pi \mathbf{x}) \sin(8\pi \mathbf{y}) + 0.4 \cos(6\pi \mathbf{x}) \cos(6\pi \mathbf{y}) \\ &+ 0.3 \cos(10\pi \mathbf{x}) \cos(4\pi \mathbf{y}) + 0.02 \sin(2\pi \mathbf{y}) + 0.02 \sin(2\pi \mathbf{x}). \end{aligned} \tag{33}$$

Spatially, we chose the grid size 64 × 64 cells. We first spun-up the system until it reached a statistical equilibrium state. This statistical equilibrium state was then set as the initial condition for our experiment. Figure 1 shows a snapshot of the obtained initial condition. Over the spin-up phase, we used *α* = 0*.*000001 and *k* = *(*2*,* 4*)*.

The time horizon for the experiment data was chosen to be the unit interval, i.e. we generated data *ω*∗*(ti, x)* for 0 = *t*<sup>0</sup> *< t*<sup>1</sup> *<* ··· *< tN* = 1. See Fig. 1 for snapshots of *ω*∗*(*0*,* **x***)* and *ω*∗*(*1*,* **x***)*. When generating the data, we used the larger value of *<sup>α</sup>* <sup>=</sup> <sup>0</sup>*.*001. This was to avoid any possible numerical issues8 when we attempted to recover *α* from data.

Assuming we know in-advance the exact Fourier wavenumber *k*, the linear system for estimation reduces to

<sup>8</sup> When *α* is small, *α*<sup>2</sup> is close to machine precision.

$$\widehat{[\boldsymbol{\omega}]}\_{l,N}(\mathbf{x}) := \sum\_{l=1}^{N} (\boldsymbol{\omega}\_{l\_l}(\mathbf{x}) - \boldsymbol{\omega}\_{l\_{l-1}}(\mathbf{x}))^2 \approx \boldsymbol{\alpha}^2 4\pi^2 \text{ } \boldsymbol{B}(t,k,\mathbf{x})\mathbf{c}\_k'(\mathbf{x})\tag{34}$$

where

$$B(t,k,\mathbf{x}) := \int\_0^t (k^\perp \cdot \nabla \omega\_\delta(\mathbf{x}))^2 ds \tag{35}$$

and

$$\mathbf{e}\_k'(\mathbf{x}, \mathbf{y}) := \left(\cos(k\_1 2\pi \mathbf{x})\sin(k\_2 2\pi \mathbf{y}) + \sin(k\_1 2\pi \mathbf{x})\cos(k\_2 2\pi \mathbf{y})\right)^2. \tag{36}$$

Thus our estimate for *α* is given by

$$
\widehat{\alpha}\_N^2 = \frac{1}{4\pi^2} \frac{\int\_{\mathbb{T}^2} \widehat{[\omega]}\_{\mathfrak{l},N}(\mathbf{x}) d\mathbf{x}}{\int\_{\mathbb{T}^2} B(t,k,\mathbf{x}) \mathfrak{e}\_k'(\mathbf{x}) d\mathbf{x}}. \tag{37}
$$

*Remark 4* In (37), we applied spatial averaging to stabilise estimation.

*Remark 5* The assumption that we know *k* in advance is of course too strong from the applications viewpoint. The aim of this experiment is to test the strength of the pathwise approach under the assumption of "perfect knowledge". If we cannot accurately recover *α* in this case, then getting a good estimate for *α* using the pathwise approach may be too difficult or impractical in more realistic scenarios.

Figure 2 shows snapshots of [ A*ω*]*t,N (***x***)* and *B(t, k,* **x***)*e *<sup>k</sup>(***x***)*. We applied (37) for different values of *N*. In each case, the time integral that constitutes *B(t, k,* **x***)* was approximated using a simple trapezoidal rule, for which the same *N* number of data snapshots were used. Figure 3 shows the results for the relative error

$$\text{err}\_N = \frac{|\alpha - \widehat{\alpha}\_N|}{\alpha} \tag{38}$$

for the different values of *N*. The results show that, in the worst case of *N* = 2500, the relative error was no greater than 0.89. This translates to an absolute error of range of 0*.*001 ± 0*.*00089. The best case was when all 200*,*000 data samples were used to estimate *α*, the relative error in that case was 0*.*00135. This suggests convergence and stabilisation of the sum for [ A*ω*]*<sup>t</sup>* .

For future work, we aim to test the pathwise approach for cases in which we do not know the exact selection of basis elements for *ξ* . Further, we wish to extend and test these ideas on coarse grained PDE data and compare with the results that were obtained in [4] using previously developed calibration methods.

**Fig. 2** Shown on the left is a snapshot of the estimate [ A*ω*]*<sup>t</sup>* , which was computed using *N* = 200*,*000 data samples. Shown on the right is a snapshot of the basis element *Bt(k, x)(*cos*(k*12*πx)*sin*(k*22*πy)* + sin*(k*12*πx)* cos*(k*22*πy))* 2, which was approximated using the same *N* number of data samples

**Fig. 3** The plot (in log log scale) shows the relative error err*<sup>N</sup>* defined in (38) as a function of *N*. err*<sup>N</sup>* was computed for *N* = 2500*,* 5000*,* 10*,*000*,* 20*,*000*,* 40*,*000*,* 50*,*000*,* 66*,*667*,* 100*,*000*,* 200*,*000

**Acknowledgments** The authors would like to thank Prof Dan Crisan for the many helpful suggestions and constructive ideas he shared with them during the preparation of this work. They also thank Prof Darryl Holm, Prof Bertrand Chapron, Prof Etienne Mémin, and the whole STUOD team for many inspiring discussions they had during the STUOD meetings.

#### **Funding**

Both authors were partially supported by the European Research Council (ERC) under the European Union's Horizon 2020 Research and Innovation Programme (ERC, Grant Agreement No 856408).

#### **Appendix**

**Lemma 6 (Gronwall Lemma)** *Let β* : [0*, T* ]→[0*,*∞*) be a non-negative absolutely continuous function that satisfies for a.e. t*

$$d\beta(t) \le \phi(t)\beta(t)dt + \psi(t)dt$$

*where φ,ψ are non-negative integrable functions on* [0*, T* ]*. Then*

$$
\beta(t) \le e^{\int\_0^t \phi(s)ds} \left(\beta(0) + \int\_0^t \psi(s)ds\right),
$$

*for all t* ∈ [0*, T* ]*.*

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Stochastic Parameterization with Dynamic Mode Decomposition**

**Long Li, Etienne Mémin, and Gilles Tissot**

**Abstract** A physical stochastic parameterization is adopted in this work to account for the effects of the unresolved small-scale on the large-scale flow dynamics. This random model is based on a stochastic transport principle, which ensures a strong energy conservation. The dynamic mode decomposition (DMD) is performed on high-resolution data to learn a basis of the unresolved velocity field, on which the stochastic transport velocity is expressed. Time-harmonic property of DMD modes allows us to perform a clean separation between time-differentiable and timedecorrelated components. Such random scheme is assessed on a quasi-geostrophic (QG) model.

**Keywords** Stochastic parameterization · Dynamical system · Data-driven

#### **1 Introduction**

The modelling under location uncertainty (LU) setting has shown to provide consistent physical representations of fluid dynamics [10, 12]. This representation introduces a random component to describe the unresolved flow components. This enables to consider less dissipative systems than the classical large-scale counterparts. Nevertheless, the ability of such a model to represent faithfully the uncertainties associated to the actual unresolved small scales highly depends on the definition of the random component and on its evolution along time. Unsurprisingly, stationarity/time-varying and homogeneity/inhomogeneity characteristics have strong influences on the results [1, 2]. Another important aspect concerns the ability to include in the noise representation a stationary drift component associated to the temporal mean of the high-resolution fluctuations. As shown in this paper such stationary drift can be elegantly introduced in the noise through Girsanov theorem.

L. Li (-) · E. Mémin · G. Tissot

Inria Rennes - Bretagne Atlantique, Campus de Beaulieu, Rennes, France e-mail: Iong.li@inria.fr

Yet, large-scale persistent components associated to the high resolution fluctuations are not strictly stationary and slowly varying quasi-periodic components might be important to include. To that purpose we devise a noise generation scheme relying on the dynamic mode decomposition [13]. Such a decomposition or other related techniques aiming to provide a spectral representation of the Koopman operator [11] will allow us to represent the noise as a superposition of random and deterministic harmonics oscillators. The first ones are attached to the fast components whereas the latter represent the slow fluctuations components. As demonstrated in Sect. 4, this strategy brings us a very efficient technique for ocean double-gyres configuration.

#### **2 Modelling Under Location Uncertainty**

In this section, we briefly review the LU setting and the associated random QG model that will be used for the numerical evaluations.

#### *2.1 Stochastic Flow*

The evolution of Lagrangian particle trajectory (*Xt*) under LU is described by the following stochastic differential equation (SDE):

$$\mathbf{d}X\_I(\mathbf{x}) = \mathfrak{v}\left(X\_I(\mathbf{x}), t\right)\mathbf{d}t + \sigma\left(X\_I(\mathbf{x}), t\right)\mathbf{d}B\_I, \qquad X\_0(\mathbf{x}) = \mathbf{x} \in \mathcal{D}, \tag{1}$$

where *v* denotes the time-smooth resolved velocity that is both spatially and temporally correlated, *σ*d*B<sup>t</sup>* stands for the fast oscillating unresolved flow component (also called *noise* in the following) that is only correlated in space, and <sup>D</sup> <sup>⊂</sup> <sup>C</sup>*<sup>d</sup>* (*d* = 2 or 3) is a bounded spatial domain.

We now give the mathematical definitions of the noise. In the following, let us fix a finite time *T <* <sup>∞</sup> and the Hilbert space *<sup>H</sup>* <sup>=</sup> *(L*2*(*D*))<sup>d</sup>* with the inner product *<sup>f</sup> , <sup>g</sup> <sup>H</sup>* <sup>=</sup> ! <sup>D</sup>*(<sup>f</sup>* †*g)(x)* <sup>d</sup>*<sup>x</sup>* and the norm *<sup>f</sup> <sup>H</sup>* <sup>=</sup> *<sup>f</sup> , <sup>f</sup>* 1*/*2 *<sup>H</sup>* , where •† stands for transpose-conjugate operation. Then, {*Bt*}<sup>0</sup>≤*t*≤*<sup>T</sup>* is an *<sup>H</sup>*-valued cylindrical Brownian motion (see definition in [4]) on a filtered probability space *(Ω,* <sup>F</sup>*,*{F*t*}<sup>0</sup>≤*t*≤*<sup>T</sup> ,* <sup>P</sup>*)*, with the covariance operator diag*(***I***<sup>d</sup> )* (where **<sup>I</sup>***<sup>d</sup>* is an *<sup>d</sup>*dimensional vector of identity operators). For each *(ω, t)* ∈ *Ω*×[0*, T* ] constraining, *σ(***·***,t)*[•] to be a (random) Hilbert-Schmidt integral operator on *H* with a bounded matrix kernel *σ*˘ = *(σ*˘*ij )i,j*=1*,...,d* such that

$$\sigma(\mathbf{x},t)\,f = \int\_{\mathcal{D}} \check{\sigma}(\mathbf{x},\mathbf{y},t) f(\mathbf{y}) \, \mathrm{d}\mathbf{y}, \quad \mathbf{f} \in H, \quad \mathbf{x} \in \mathcal{D}. \tag{2a}$$

Its adjoint operator *σ*∗*(***·***,t)*[•] satisfying *σ(***·***,t)f , g <sup>H</sup>* = *f , σ*∗*(***·***,t)g <sup>H</sup>* reads:

$$\left(\boldsymbol{\sigma}^\*(\mathbf{x}, t)\,\mathbf{g} = \int\_{\mathcal{D}} \check{\boldsymbol{\sigma}}^\dagger(\mathbf{x}, \mathbf{y}, t) \mathbf{g}(\mathbf{y}) \,\mathrm{d}\mathbf{y}, \quad \mathbf{g} \in H, \quad \mathbf{x} \in \mathcal{D}. \tag{2b}$$

The composite operator *σ(***·***,t)σ*∗*(***·***,t)*[•] is trace class on *H* and admits eigenfunctions *<sup>ξ</sup> n(***·***,t)* with eigenvalues *λn(t)* satisfying *<sup>n</sup>*∈<sup>N</sup> *λn(t) <* +∞. The noise can then be equally defined by the spectral decomposition:

$$\sigma(\mathbf{x},t)\,\mathrm{d}\mathbf{B}\_I = \sum\_{n\in\mathbb{N}} \lambda\_n^{1/2}(t)\xi\_n(\mathbf{x},t)\,\mathrm{d}\beta\_n(t),\tag{3}$$

where *βn* are independent standard Brownian motions. In addition, we assume that the operator-space-valued process {*σ(***·***,t)*[•]}<sup>0</sup>≤*t*≤*<sup>T</sup>* is stochastically integrable, i.e. P ! *T* 0 *<sup>n</sup>*∈<sup>N</sup> *λn(t)* <sup>d</sup>*t <* +∞ <sup>=</sup> 1. From [4], the stochastic integral { ! *t* <sup>0</sup> *σ(***·***, s)* d*Bs*}<sup>0</sup>≤*t*≤*<sup>T</sup>* is a continuous square integrable *H*-valued martingale, hence a centered Gaussian process, <sup>E</sup>P[ ! *t* <sup>0</sup> *σ(***·***, s)* d*Bs*] = **0**, of bounded variance, EP ! *t* <sup>0</sup> *<sup>σ</sup>(***·***, s)* <sup>d</sup>*Bs*<sup>2</sup> *H <* +∞. Moreover, the joint quadratic variation process of the noise, evaluated at the same point *x* ∈ D, is given by

$$\left\langle \int\_{0}^{\cdot} \sigma(\mathbf{x}, \mathbf{s}) \, \mathrm{d}B\_{s}, \int\_{0}^{\cdot} \sigma(\mathbf{x}, \mathbf{s}) \, \mathrm{d}B\_{s} \right\rangle\_{\mathrm{I}} = \int\_{0}^{\mathrm{I}} \mathbf{a}(\mathbf{x}, \mathbf{s}) \, \mathrm{d}s \tag{4a}$$

$$\mathfrak{a}(\mathbf{x},t) = \int\_{\mathcal{D}} \check{\sigma}(\mathbf{x}, \mathbf{y}, t) \check{\sigma}^{\dagger}(\mathbf{y}, \mathbf{x}, t) \, \mathrm{d}\mathbf{y} = \sum\_{n \in \mathbb{N}} \lambda\_{n}(t) \left(\mathfrak{k}\_{n} \mathfrak{k}\_{n}^{\dagger}\right)(\mathbf{x}, t). \tag{4b}$$

We remark that real-valued noise can be achieved by adding the constraint that both eigenfunctions, eigenvalues and the standard Brownian motions in (3) are organised in complex-conjugated pairs. In that case, its joint quadratic variation process is real-valued as well.

The previous formulations consist of only a zero-mean and temporally uncorrelated noise. However, this might not be enough and including a mean or time-correlated component of the unresolved velocity field could be of crucial importance to obtain a relevant model. For instance, the eddy parametrization proposed by [15] is decomposed into a deterministic mean term and a stochastic term of zero-mean. For the double-gyre circulation configuration, the considered deterministic parametrization allows to reproduce the eastwards jet for the coarseresolution model, while the additional stochastic terms enhance the gyres circulation and improves the flow variability. Similarly, the random-forcing model proposed by [3] consists in a space-time correlated stochastic process to enhance the jet extension. The slow modes of the sub-grid scales can be provided by adequate highpass filtering of high-resolution data on the coarse grid. We aim in this work at investigating the incorporation of such slow components within the LU framework. However, the derivation of LU models [10, 12, 1] relies on the martingale properties of the centered noise and we need hence to properly handle non centred Brownian terms. The Girsanov transformation [4] provides a theoretical tool that fully warrants such a superposition: by a change of the probability measure, the composed noise can be centered with respect to a new probability measure while the additional drift term appears, which pulls back time-correlated sub-grid-scale components into the dynamical system. The associated mathematical description is given as follows. Let *Γ <sup>t</sup>* be an *H*-valued F*t*-predictable process satisfying the Novikov condition, EP exp*(* <sup>1</sup> 2 ! *T* <sup>0</sup> *<sup>Γ</sup> <sup>t</sup>*<sup>2</sup> *<sup>H</sup>* <sup>d</sup>*t) <sup>&</sup>lt;* +∞, then the process {*B<sup>t</sup>* := *<sup>B</sup><sup>t</sup>* <sup>+</sup>! *<sup>t</sup>* <sup>0</sup> *Γ <sup>s</sup>* d*s*}<sup>0</sup>≤*t*≤*<sup>T</sup>* is an *<sup>H</sup>*-valued cylindrical Wiener process on *(Ω,* <sup>F</sup>*,*{F*t*}<sup>0</sup>≤*t*≤*<sup>T</sup> ,*'P*)* with Radon-Nikodym derivative

$$\frac{d\widetilde{\mathbb{P}}}{d\mathbb{P}} = \exp\left(-\int\_0^T \langle \Gamma\_I, \mathrm{d}\mathcal{B}\_I \rangle\_H - \frac{1}{2} \int\_0^T \|\Gamma\_I\|\_H^2 \,\mathrm{d}t\right). \tag{5a}$$

In this case, the SDE (1) under the probability measure 'P reads:

$$\text{SDE}\left(\mathbf{l}\right) \text{ under the probability measure } \widetilde{\mathbb{P}} \text{ reads:}$$

$$\mathbf{d}X\_{l} = \left(\mathfrak{v}(X\_{l},t) - \sigma(X\_{l},t)\Gamma\_{l}\right)\mathbf{d}t + \sigma(X\_{l},t)\,\mathbf{d}\widetilde{\mathbf{B}}\_{l}.\tag{5b}$$

In the present work, we shall consider rather this modified stochastic flow defined on *(Ω,* <sup>F</sup>*,*{F*t*}<sup>0</sup>≤*t*≤*<sup>T</sup> ,*'P*)* with <sup>E</sup>'P[*σ*d*Bt*] = **<sup>0</sup>** as the physical solution. Hereafter, *σ Γ <sup>t</sup>* is referred to as the *Girsanov drift*.

#### *2.2 Stochastic QG Model*

The evolution law of a random tracer (function) *Θ* transported along the stochastic flow, *Θ(Xt*+*δt, t* + *δt)* = *Θ(Xt,t)*, is derived by [10, 1]. Under the probability measure 'P, this can be described by the following stochastic partial differential equation (SPDE), namely

$$\begin{aligned} \text{In (SPDE), namely} \\\\ \mathbb{D}\_l \Theta &:= \mathrm{d}\_l \Theta + (\widetilde{\mathbf{v}}^\star \mathrm{d}t + \sigma \mathrm{d}\widetilde{\mathcal{B}}\_l) \cdot \nabla \Theta - \frac{1}{2} \nabla \cdot (\mathbf{a} \nabla \Theta) \, \mathrm{d}t = 0 \end{aligned} \tag{6a}$$

$$\mathbf{d}\_{l}\Theta := \mathbf{d}\_{l}\Theta + (\widetilde{\mathbf{v}}^{\star}\,\mathrm{d}t + \sigma\,\mathrm{d}\dot{\mathbf{B}}\_{l})\cdot\nabla\Theta - \frac{1}{2}\nabla\cdot(\mathbf{a}\nabla\Theta)\,\mathrm{d}t = 0\tag{6a}$$

$$\widetilde{\mathbf{v}}^{\star} := \mathbf{v} - \frac{1}{2}\nabla\cdot\mathbf{a} + \sigma^{\*}(\nabla\cdot\boldsymbol{\sigma}) - \sigma\,\Gamma,\tag{6b}$$

In this SPDE, the first term d*tΘ(x)* := *Θ(x, t* + *δt)* − *Θ(x,t)* stands for the (forward) increment of *Θ* at a fixed point *x* ∈ D; the second term describes the tracer's advection by an *effective drift <sup>v</sup>\** and the noise *<sup>σ</sup>*d*B<sup>t</sup>* ; the last term depicts the tracer's diffusion through the noise quadratic variation *a*. The effective drift (6b) ensues from (*i*) the noise inhomogeneity, (*ii*) the possible unresolved flow divergence and (*iii*) the statistical correction due to the change of probability measures, respectively.

The derivation of the stochastic geophysical models under the LU framework follows exactly the same path as the deterministic derivation, together with a proper scaling of the noise and its amplitude. In particular, a continuously stratified QG model under LU has been derived by [12, 9] using an asymptotic approach. With horizontally moderate and vertically weak noises (see definitions in [12, 9]), the governing equations under the probability measure 'P read:

Evolution of potential vorticity (PV):

$$\mathbb{D}\_{l}q = \sum\_{l=1,2} \mathbf{J} \Big( (\widetilde{\boldsymbol{\mu}}^{\star})^{l} \, \mathrm{d}t + (\boldsymbol{\sigma} \, \mathrm{d}\widetilde{B}\_{l})^{l}, \boldsymbol{u}^{l} \Big) - \left( \frac{1}{2} \nabla \cdot \left( \partial\_{\boldsymbol{\chi}\_{l}}^{\perp} \boldsymbol{a} \nabla \boldsymbol{u}^{l} \right) + \beta \partial\_{\boldsymbol{\chi}\_{l}} a\_{l2} \right) \mathrm{d}t, \qquad (\mathrm{7a})$$

From PV to streamfunction:

$$
\nabla^2 \psi + \partial\_{\varepsilon} \left( \frac{f\_0^2}{N^2} \partial\_{\varepsilon} \psi \right) = q - \beta \mathbf{y},
\tag{7b}
$$

Incompressible constraints:

$$\frac{\cdots}{\text{Incompressible constraints:}} \text{s.t.}$$

$$\boldsymbol{\mu} = \nabla^{\perp} \boldsymbol{\psi}, \quad \nabla \cdot \boldsymbol{\sigma} \text{d}\widetilde{\boldsymbol{B}}\_{l} = \nabla \cdot (\widetilde{\boldsymbol{u}}^{\star} - \boldsymbol{u}) = 0. \tag{7c}$$

Here, **∇** = [*∂x , ∂y* ] *<sup>T</sup>* , **∇**<sup>⊥</sup> = [−*∂y , ∂x* ] *<sup>T</sup>* , <sup>∇</sup><sup>2</sup> <sup>=</sup> *<sup>∂</sup>*<sup>2</sup> *xx* <sup>+</sup> *<sup>∂</sup>*<sup>2</sup> *yy* denote two-dimensional operators and J*(f, g)* = *∂xf ∂yg* − *∂xg∂yf* stands for the Jacobian operator. The vector fields *<sup>u</sup>, <sup>σ</sup>*d*B<sup>t</sup>* and the tensor field *<sup>a</sup>* are two-dimensional (2D) horizontal quantities. The horizontal effective drift is defined as *<sup>u</sup>\** := *<sup>u</sup>* <sup>−</sup> **∇ ·** *(a/*2*)* <sup>−</sup> *σ Γ* . The scalar fields *q* and *ψ* represent the PV and the streamfunction. In Eq. (7b), *<sup>N</sup>*<sup>2</sup> = −*(g/ρ*0*)∂zρ* is the Brunt-Väisälä (or buoyancy) frequency with *<sup>g</sup>* the gravity value, *ρ*<sup>0</sup> the background density, *ρ* the density anomaly, and *f*<sup>0</sup> +*βy* is the Coriolis parameter under a beta-plane approximation. As shown in [1], one important characteristic of the random model (7) is that it conserves the total energy of the resolved flow (under natural boundary condition) for any realization (i.e. pathwise). This property highlights a strong relation between the classical deterministic model and the stochastic formulation.

#### **3 Numerical Parameterization of Unresolved Flow**

Data-driven approaches are presented in this section to estimate the spatial correlation functions of the unresolved flow component based on the spectral decomposition (3). In practice, we work with a finite set of functions to represent the small-scale Eulerian velocity fluctuations rather than with the Lagrangian particles trajectory. We first review the empirical orthogonal functions (EOF) method for which the noise covariance is assumed quasi-stationary. We then propose an approach relying on the dynamic mode decomposition (DMD) to account for the temporal behavior of the spatial correlations.

#### *3.1 EOF-Based Method*

In the following, let {*u*HR*(x, ti)*}*i*=1*,...,N* be the set of velocity snapshots provided by a high-resolution (HR) simulation. We first build the spatial local fluctuations *u<sup>f</sup> (x, ti)* of each snapshot on the coarse-grid points. In particular, for the QG system (7), one can first perform a high-pass filtering with a 2D Gaussian convolution kernel *G* on each HR streamfunction *ψ*HR, to obtain the streamfunction fluctuations, *ψf (x, ti)* <sup>=</sup> *(*<sup>I</sup> <sup>−</sup> <sup>G</sup>*)\*ψ*HR *(x, ti)* (only for the coarse-grid points *x*). Then, the geostrophic velocity fluctuations can be derived by *u<sup>f</sup>* = **∇**<sup>⊥</sup> LR*ψf* . We next centre the data set by *u <sup>f</sup>* <sup>=</sup> *<sup>u</sup><sup>f</sup>* <sup>−</sup> *<sup>u</sup><sup>f</sup> <sup>t</sup>* (with •*<sup>t</sup>* the temporal mean) and perform the EOF procedure [9] to get a set of orthogonal temporal modes {*αm*}*m*=1*,...,N* and orthonormal spatial modes {*φm*}*m*=1*,...,N* satisfying

$$\mathfrak{u}'\_f(\mathfrak{x}, t\_l) = \sum\_{m=1}^N \alpha\_m(t\_l) \phi\_m(\mathfrak{x}), \quad \overline{\alpha\_m \alpha\_n}^t = \lambda\_m \delta\_{m,n}. \tag{8}$$

Truncating the modes (with *M* \$ *N*) and rescaling by a small-scale decorrelation time *τ* , the stationary noise and its quadratic variation can be build by

$$\text{me } \mathfrak{r}, \text{ une sauonary nonse auu ns quauran varianon can be unu by}$$

$$\mathfrak{g}(\mathfrak{x}) \text{d}\widetilde{\mathbf{B}}\_{l} = \sqrt{\pi} \sum\_{m=1}^{M} \sqrt{\lambda\_{m}} \mathfrak{\phi}\_{m}(\mathfrak{x}) \, \text{d}\beta\_{m}(t), \qquad \mathfrak{a}(\mathbf{x}) = \pi \sum\_{m=1}^{M} \lambda\_{m} \mathfrak{\phi}\_{m}(\mathbf{x}) \mathfrak{\phi}\_{m}^{T}(\mathbf{x}). \qquad (9)$$

Note that this time scale *τ* is used to match the fact that the noise in (5b) has the physical dimension of a length. In practice, we often consider the coarse-grid simulation timestep *Δt*LR. In addition, the Girsanov drift is set to be *σ(x)Γ <sup>t</sup>* = *u<sup>f</sup> <sup>t</sup> (x)*. It means that the Girsanov drift here is the projection of the temporal mean of the sub-grid scales onto the EOFs, i.e. *<sup>σ</sup>(x)<sup>Γ</sup> <sup>t</sup>* <sup>=</sup> *<sup>N</sup> <sup>m</sup>*=<sup>1</sup> *γmφm(x)* with *γm* <sup>=</sup>*u<sup>f</sup> <sup>t</sup> , <sup>φ</sup><sup>m</sup> <sup>H</sup>* satisfying *<sup>N</sup> <sup>m</sup>*=<sup>1</sup> *<sup>γ</sup>* <sup>2</sup> *<sup>m</sup> <* +∞.

#### *3.2 DMD-Based Method*

The DMD algorithm [13] seeks a spectral decomposition of the best-fit linear operator *A* that relates the two snapshots:

$$
\mathfrak{u}'\_f(\mathfrak{x}, t\_{l+1}) \approx A \mathfrak{u}'\_f(\mathfrak{x}, t\_l). \tag{10a}
$$

Applying the exact DMD procedure proposed by [14], the corresponding spectral expansion in continuous time reads

$$\mathfrak{u}\_f'(\mathfrak{x}, t) = \sum\_{m=1}^{N} b\_m \exp\left( (\sigma\_m + \mathrm{i}\,\omega\_m)t \right) \mathfrak{p}\_m(\mathfrak{x}), \tag{10b}$$

where *<sup>ϕ</sup>m(x)* <sup>∈</sup> <sup>C</sup>*<sup>d</sup>* are the DMD modes (eigenvectors of *<sup>A</sup>*) associated to the DMD eigenvalues *μm* <sup>∈</sup> <sup>C</sup>, *σm* <sup>=</sup> log*(*|*μm*|*)/Δt*<sup>d</sup> <sup>∈</sup> <sup>R</sup> are the modes growth rate (with *Δts* <sup>=</sup> *ti*+<sup>1</sup> <sup>−</sup> *ti* the sampling step of data), *ωm* <sup>=</sup> arg*(μm)/Δts* <sup>∈</sup> <sup>R</sup> are the modes frequencies (with i the imaginary unit) and *bm* <sup>∈</sup> <sup>C</sup> are the modes amplitudes. In practice, our data set of velocity fluctuations is real valued, hence the DMD modes (also eigenvalues and amplitudes) are two-by-two complex conjugates, i.e. *ϕ*2*<sup>p</sup>* = *ϕ*2*p*−<sup>1</sup> *(p* = 1*,...,N/*2*)*.

We next propose to split the total set of DMD modes into two subsets, <sup>M</sup>*<sup>c</sup>* and <sup>M</sup>*r*, to select separately adequate fast and slow modes for the noise (from <sup>M</sup>*r*) and the Girsanov drift (from <sup>M</sup>*c*), respectively, according to the following analysis of frequencies and amplitudes:

$$\mathcal{M}^c = \left\{ m \in [1, N] \: \middle| \: \left| \mu\_m \right| \approx 1, \ |o\_m| \le \frac{\pi}{\mathfrak{r}\_c}, \ |b\_m| \ge C \right\}, \tag{11a}$$

$$\mathcal{M}^r = \left\{ m \in [1, N] \: \middle| \: \left| \mu\_m \right| \approx 1, \ |o\_m| > \frac{\pi}{\mathfrak{r}\_c}, \ |b\_m| \ge C \right\}, \tag{11b}$$

where *τc* is a temporal-separation-scale that can be estimated by the spatial mean of the autocorrelation functions of data and *C* denotes an empirical cutoff of amplitudes. The DMD modes that are neither included inM*<sup>c</sup>* nor inM*<sup>r</sup>* are discarded. An example of spectrum and amplitudes of the selected DMD modes is shown in Fig. 1. In order to avoid spurious effects associated with the non-orthogonality of DMD modes, their amplitudes are rescaled such that the reconstructed data corresponds to

**Fig. 1** Illustration of the selections of DMD modes used for the noise (orange) and the Girsanov drift (blue)

an orthogonal projection onto the subspace spanned by the modes in <sup>M</sup>*<sup>c</sup>* or <sup>M</sup>*r*. In particular, we propose to rescale those truncated DMD modes as follows:


Such procedure holds separately for the DMD modes of <sup>M</sup>*<sup>c</sup>* and <sup>M</sup>*r*. Finally, the noise and the correction drift can be defined as

$$\begin{array}{l}\text{The mean separation for the DTM measure on }\gamma\text{'s, 'mm'}, \text{ or many, the correction drift can be defined as}\\\\ \sigma(\mathbf{x}, t)\mathbf{d}\widetilde{\mathbf{B}}\_{l} = \sqrt{\pi}\sum\_{m\in\mathcal{M}^{l}} \exp(\mathbf{i}\cdot\boldsymbol{\omega}\_{m}t)\boldsymbol{\phi}\_{m}(\mathbf{x})\,\mathrm{d}\boldsymbol{\beta}\_{m}(t),\end{array}\tag{12a}$$

$$\sigma(\mathbf{x}, t)\varGamma\_l = \overline{\mathbf{u}\_f}^l(\mathbf{x}) + \sum\_{m \in \mathcal{M}^c} \exp(\mathbf{i}\,\omega\_m t)\phi\_m(\mathbf{x}),\tag{12b}$$

In particular, we assume that each pair of the complex Brownian motions are conjugates (*β*2*<sup>p</sup>* = *β*2*p*−1) and their real and imaginary parts are independent. As such, both noise *<sup>σ</sup>*d*B<sup>t</sup>* and correction drift *σ Γ <sup>t</sup>* are real-valued fields. In addition, the joint quadratic variation of such noise remains stationary:

$$a(\mathbf{x}) = \pi \sum\_{m \in \mathcal{M}'} \phi\_m(\mathbf{x}) \phi\_m^\dagger(\mathbf{x}). \tag{12c}$$

In a similar way as in the EOF-based method, we could also construct the Girsanov drift by the projection of the RHS of (12b) onto the DMD modes. As we have dropped the unstable DMD modes, one can show that the predictability and the Novikov condition (presented in Sect. 2) of *Γ* hold in this case.

#### **4 Numerical Experiments**

In this section, we present some numerical results of the stochastic QG system (7). The objective consists to improve the variability of large-scale models defined on coarse grids. To that end, a high-resolution deterministic reference model (*REF*) is first simulated and compared to several coarse-resolution models: the benchmark deterministic model (*DET*), two stochastic models with an EOF-based noise (*STO-EOF*) and a DMD-based noise (*STO-DMD*).

#### *4.1 Configurations*

In this study, we consider a vertically discretized QG dynamical core proposed in [8] and extended in the stochastic setting [9]. This model consists in *n* isopycnal layers with constant thickness *Hk* and density *ρk* in each layer *k*. In this case, the prognostic variables such as *ψ* in (7) are assumed to be layer-averaged quantities. Homogeneous Dirichlet boundary conditions have been imposed for the term *f*0*∂zψ/N*<sup>2</sup> in (7b) at the ocean surface and bottom. Moreover, external forcing and numerical dissipation are included in the evolution of PV (7a): the Ekman pumping **∇**<sup>⊥</sup> **·** *τ* due to the wind stress *τ* over ocean surface boundary, a linear drag <sup>−</sup>*(f*0*η*ek*/*2*)*∇<sup>2</sup>*ψn* at ocean bottom with a very thin thickness *<sup>η</sup>*ek, and a biharmonic dissipation <sup>−</sup>*A*4∇4*(*∇<sup>2</sup>*ψk)* in each layer with uniform coefficient *<sup>A</sup>*4. In particular, we consider here a finite box ocean driven by an idealized (stationary and symmetric) wind stress *τ* = [−*τ*<sup>0</sup> cos*(*2*πy)/Ly ,* 0] *<sup>T</sup>* . A mixed horizontal boundary condition is used for the *<sup>k</sup>*-th layer streamfunction: *ψk*|*∂*<sup>A</sup> <sup>=</sup> *fk(t)* and *<sup>∂</sup>*<sup>2</sup> *nψk*|*∂*<sup>A</sup> = −*(α*bc*/Δx)∂nψk*|*∂*<sup>A</sup> (same for the 4-th order derivative). Here, A denotes the 2D area, *fk* is a time-dependent function constrained by mass conservation [7], *Δx* stands for the horizontal resolution and *α*bc is a nondimensional coefficient associated to the slip conditions [7]. A quiescent initial condition is used for the *REF*, whereas a spin-up condition downsampled from *REF* (after 90-years integration) is adopted for all the coarse-resolution models. The common parameters for all the simulations are listed in Table 1, whereas resolution dependant parameters are presented separately in Table 2. Both EOF and DMD modes are calibrated from the *REF* data during 40 years (after the spin-up) with a 5-days sampling step. As for the numerical discretization, a conservative flux form [9] together with a stochastic Leapfrog scheme [5] is adopted for the evolution of PV (7a). The inversion of the modified Helmholtz equation (7b) is carried out with a discrete sine transform method [7].


**Table 1** Common parameters for all the models. The buoyancy frequency *N*<sup>2</sup> in (7b) is approximated by *g <sup>k</sup>*+0*.*5*/(Hk* <sup>+</sup> *Hk*+1*)/*2 on the interface between layers *<sup>k</sup>* and *<sup>k</sup>* <sup>+</sup> <sup>1</sup>

**Table 2** Values of grid varying parameters. The energy proportion captured by the truncated EOF modes are given in the bracket. For DMD method, the first number stands for the size of <sup>M</sup>*<sup>c</sup>* (11a) whereas the latter is the one of <sup>M</sup>*<sup>r</sup>* (11b)


**Fig. 2** Snapshots of surface PV provided by different simulations after 60-years integration. The black arrows are the interpolated geostrophic velocities

Snapshots of the surface PV provided by the different simulations are shown in Fig. 2. The dynamics of *REF* (5 km) model is mainly characterized by a meandering eastward jet with adjacent recirculations, which results from the most active mesoscale eddies effect through baroclinic instability. However, this effect cannot be properly resolved once the horizontal resolution exceeds the baroclinic deformation radius maximum (39 km here). For instance, the *DET* (80 km) simulation generates only a smooth symmetric field. On the other hand, both *STO-EOF* and *STO-DMD* models are able to reproduce the eastward jet on the coarse mesh (80 km) by including the non-linear effects carried both by the unresolved noise and the correction drift. In particular, the *STO-DMD* model produces a stronger meridional perturbation along the jet and is able to capture some of the large-wave structures predicted by the *REF* model. The improvements brought by these random models will be diagnosed and analyzed more precisely in the following.

#### *4.2 Diagnostics*

We first compare the long-term mean (over a 100-years interval) of the kinetic energy (KE) spectrum for both coarse models at different resolutions (40, 80,

**Fig. 3** Temporal mean of vertically integrated KE spectra for the different models

120 km). As shown in Fig. 3, introducing only a dissipation mechanism like the biharmonic viscosity in the *DET* coarse models leads to an excessive decrease of the resolved KE compared to the *REF* model. Both *STO-EOF* and *STO-DMD* models at different resolutions, recover a given amount of lost energy over all wavenumbers. In particular, the *STO-DMD* models provide higher KE backscattering at large scales and better spectrum slope in the inertial-range than the stationary unresolved models. This seems to highlight the importance of the non-stationary characteristic of the noise and Girsanov drift.

We then quantify the temporal variability (over the same 100-years interval) predicted by the different coarse models. In this work, we adopt the following three global metrics. The first one is the root-mean-square error (RMSE) between the standard deviation of the streamfunction of a coarse model (denoted by *σ*[*ψ* <sup>M</sup>]) and the subsampled high-resolution one (denoted by *σ*[*ψ*<sup>R</sup>]), *σ*[*ψ* <sup>M</sup>] − *σ*[*ψ*<sup>R</sup>]*L*2*(*D*)*, where D = A × [−*H,* 0] and *H* stands for the total depth of the ocean basin. The second criterion is the Gaussian relative entropy (GRE) [6] which assesses in a single measure the mean and variance reconstruction:

$$\text{GRE} = \frac{1}{|\mathcal{D}|} \int\_{\mathcal{D}} \frac{1}{2} \left( \frac{(\overline{\psi^{\text{M}}} - \overline{\psi^{\text{R}}}^{t})^{2}}{\sigma^{2} [\psi^{\text{M}}]} + \frac{\sigma^{2} [\psi^{\text{R}}]}{\sigma^{2} [\psi^{\text{M}}]} - 1 - \log \left( \frac{\sigma^{2} [\psi^{\text{R}}]}{\sigma^{2} [\psi^{\text{M}}]} \right) \right) d\mathbf{x}. \qquad (13)$$

It is clear that a coarse model of high variability will have low RMSE and GRE, whereas a poor variability will lead to a large RMSE and GRE. The last metric measures the eddy kinetic energy (EKE), *(ρ*0*/*2*)u* 2 *(L*2*(*D*))*<sup>2</sup> , where *<sup>u</sup>* := *(I* <sup>−</sup> F*t)*[*u*] is the eddy velocity filtered out through a 2-years low-pass filter F*<sup>t</sup>* at every point in space. For comparison reason, we show here only the time average of this metric (EKE) for the different models.

These three criteria are shown in Fig. 4 as bar plots. The *DET* models show very high RMSE and GRE with a very low order of EKE, meaning that they produce poor variability along time and failed to represent the eddies effect. Compared to the *STO-EOF*, the *STO-DMD* models enable to increase significantly the internal variability and the eddy energy. Moreover, these improvements are resolution-aware. As shown

**Fig. 4** Comparison of variability measures for different coarse models. The *y*-axis of the last two figures are in log-scales

in Table 2, under a similar level of captured energy, the *STO-DMD* models require much less modes than the *STO-EOF*, which reduces first the memory cost. Then, in terms of computational cost at each step, the former consists in generating less Gaussian variables than the latter, and reduces hence as well the dimension of the matrix-vector multiplication for the spectral decomposition (3).

#### *4.3 Discussion*

In order to distinguish the contribution of the correlated Girsanov drift and the uncorrelated noise, three additional benchmark runs (at resolution 80 km) have been further performed and compared to the proposed *STO-DMD* model, they are (i) *STO-DMD* without any correlation drift (i.e. *σ Γ <sup>t</sup>* = 0); (ii) *STO-DMD* only with *σ Γ <sup>t</sup>* <sup>=</sup> *<sup>u</sup><sup>f</sup> <sup>t</sup>* ; (iii) a simplified deterministic version of the proposed *STO-DMD* model, denoted as *DET-DMD*, which only encodes the (full) correlated drift *σ Γ <sup>t</sup>* into the *DET* model. We remark that for the two first runs the DMD modes used for the correlated drift in the previous stochastic model are now included into the noise component. As shown in Fig. 5, run (i) fails to reproduce the eastwards jet on the coarse mesh, whereas the other runs succeed. However, run (ii) produces similar results as the *STO-EOF* model (see Fig. 2) with a lower improvement of variability, and run (iii) captures more waves than the others, yet leads to a reduction of the jet magnitude compared to the proposed *STO-DMD* model. In particular, by comparing the KE spectra of the different runs, Fig. 6 illustrates that the simplified *DET-DMD* model allows to produce backscattering of KE from small to large scales, and the proposed *STO-DMD* enhances this result with significantly higher KE at large-scales. We observe a consistent conclusion for the EKE budget (see Fig. 6). These comparisons demonstrate that the both correlated drift (*σ Γ <sup>t</sup>*) and the uncorrelated noise (*σ*d*Bt*) contribute on the prediction of large-scale patterns and on the improvement of the variability of the large-scale models.

**Fig. 5** Snapshots of surface PV provided by different simulations after 60-years integration. These four figures (from left to right) correspond to the benchmark runs (i), (ii), (iii) and the proposed *STO-DMD* model

**Fig. 6** Comparison of KE spectra and layered EKE (only horizontally integrated) for different coarse models

#### **5 Conclusions**

The proposed stochastic parameterization has been successfully implemented in a well established QG dynamical core. Different noises defined from high-resolution data have been considered. An additional correction drift ensuing from a change of probability measure has been introduced. This non-intuitive term seems quite important in the reproduction of the eastward jet within the wind-driven doublegyre circulation. Furthermore, the DMD procedure has been adopted to represent the quasi-periodic dynamic of the unresolved flow. The resulting random model enables us to improve the intrinsic variability of the large-scale resolved flow.

**Acknowledgments** The authors acknowledge the support of the ERC EU project 856408- STUOD. The source codes can be found in https://github.com/matlong/qgcm\_lu.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Deep Learning for the Benes Filter**

#### **Alexander Lobbe**

**Abstract** The filtering problem is concerned with the optimal estimation of a hidden state given partial and noisy observations. Filtering is extensively studied in the theoretical and applied mathematical literature. One of the central challenges in filtering today is the numerical approximation of the optimal filter. Here, accurate and fast methods are actively sought after, especially for such high-dimensional settings as numerical weather prediction, for example. In this paper we present a brief study of a new numerical method based on the mesh-free neural network representation of the density of the solution of the filtering problem achieved by deep learning. Based on the classical SPDE splitting method, our algorithm includes a recursive normalisation procedure to recover the normalised conditional distribution of the signal process. The present work uses the Benes model as a benchmark. The Benes filter is a well-known continuous-time stochastic filtering model in one dimension that has the advantage of being explicitly solvable. Within the analytically tractable setting of the Benes filter, we discuss the role of nonlinearity in the filtering model equations for the choice of the domain of the neural network. Further, we present the first study of the neural network method with an adaptive domain for the Benes model.

**Keywords** Nonlinear filtering · Deep learning · Stochastic PDE approximation

#### **1 Introduction**

Stochastic Filtering, i.e. the estimation of a signal process given only partial and noisy observations, is a well-studied problem, both in the theoretical and applied literature. It is relevant in many practical domains, for example in numerical weather prediction. Therefore, there is a high demand for efficient numerical methods to

A. Lobbe (-)

Department of Mathematics, Imperial College London, London, UK e-mail: alex.lobbe@imperial.ac.uk

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_12

approximate the optimal filter. Many such methods are known in the literature, among them the SPDE splitting method can be used to solve the filtering problem in low dimensions. The reason for the inefficiency of the splitting method in higher dimensions stems from the fact that the underlying state space must be explicitly discretised. This is problematic as the required number of discretisation points, known as the *mesh*, grows exponentially with the dimension of the state space. For this reason, the authors of [4] present a modified splitting method for the filtering problem which does not rely on the explicit space discretisation. The method developed in [4] is therefore called *mesh-free* and relies on a neural network representation of the solution. This means that, instead of approximating the values of the solution on a discrete mesh, we can optimize the parameters of a neural network defined on the state-space itself.

In this paper we present a further study of the deep learning method developed in [4] on the example of the Benes filter. The algorithm is derived from the classical splitting method for SPDEs which consists of a deterministic PDE approximation step and a normalisation step to incorporate the randomness of the SPDE. Our algorithm replaces the PDE approximation step of the splitting method by a neural network representation and learning algorithm. Combined with the Monte-Carlo method for the normalisation step, this method becomes completely mesh-free. Furthermore, an important property of the methodology in the filtering context is the ability to iterate it over several time steps. This allows the algorithm to be run *online* and to successively process observations arriving sequentially. In order to be computationally feasible, the domain of the neural network needs to be restricted. This restricted domain needs to cover the support of the density as well as possible in order to yield a sensible solution. In [4] the neural network domain is fixed a priori and does not move with the solution. This presents two problems. First, it is unnecessarily large to cover the support over all timesteps. Second, the solution may eventually move outside the computational domain, rendering the approximation inadequate. It was therefore noted in [4] that a possible extension of the approximation method would be given by an adaptive domain as the support of the neural network. We present in this work the first results obtained using an adaptive domain in the nonlinear and analytically tractable case of the Benes filter.

The paper is structured as follows. In Sect. 1.1 we briefly introduce the nonlinear, continuous-time stochastic filtering framework. The setting is identical to the one assumed in [4] and the reader may consult [1] for an in-depth treatment of stochastic filtering. Thereafter, in Sect. 2.2, we formulate the Benes filtering model used as a benchmark. Then, in Sect. 1.2 we introduce the filtering equation and the classical SPDE splitting method. This is the method upon which the new algorithm in [4] was built.

Next, in Sect. 2 we present an outline of the derivation of the new methodology. For details, the reader is referred to the original article [4]. The first idea of the algorithm, presented in Sect. 2.1 is to reformulate the solution of the PDE for the density of the unnormalised filter as an expected value. This is done using the Feynman–Kac formula, based on an auxiliary diffusion process derived from the model equations. Moreover, in Sect. 2.3 we briefly specify the neural network parameters used in the method, as well as the employed loss-function. The theoretical part of the paper is concluded with Sect. 2.4 where we show how to normalise the obtained neural network from the prediction step using Monte-Carlo approximation for linear sensor functions.

Section 3 contains the detailed parameter values and results of the numerical studies that we performed. Specifically, we perform two experiments, the first one, Sect. 3.1, is carried without any domain adaptation and highlights the limitations of ad-hoc parameterization of the domain. It is a simulation of the Benes filter using the deep learning method over a larger domain, as well as longer time interval than in the paper [4]. In particular, the size of the domain was estimated using the exact solution of the Benes model. This is necessary, as the nonlinearity of the Benes model makes it difficult to know the evolution of the posterior a priori. Thus we would be requiring a much larger domain, if chosen in an ad-hoc way. The second experiment, in Sect. 3.2, reports the performance of the proposed framework with domain adaptation. The adaptation was performed using precomputed estimates of the support of the filter by employing the solution formula for the Benes filter.

Finally, we formulate the conclusions from our experiments in Sect. 4. In short, the domain adapted method was more effective in resolving the bimodality in our study than the non-domain adapted one. However, this came at the cost of a linear trend in the error.

#### *1.1 Nonlinear Stochastic Filtering Problem*

The stochastic filtering framework consists of a pair of stochastic processes *(X, Y )* on a probability space *(Ω,* F*,* P*)* with a normal filtration *(*F*t)t*≥<sup>0</sup> modelled, P-a.s., as

$$X\_l = X\_0 + \int\_0^l f(X\_s) \, \mathrm{d}s + \int\_0^l \sigma(X\_s) \, \mathrm{d}V\_s \, \,, \tag{1}$$

and

$$Y\_l = \int\_0^l h(X\_s) \, \text{ds} + W\_l \, . \tag{2}$$

Here, the time parameter is *<sup>t</sup>* ∈ [0*,*∞*)*, *d,p* <sup>∈</sup> <sup>N</sup> and *<sup>f</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup>* and *<sup>σ</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>*d*×*<sup>p</sup>* are the drift and diffusion coefficient functions of the signal. The processes *V* and *W* are *p*– and *m*-dimensional independent, *(*F*t)t*≥0-adapted Brownian motions. We call *X* the *signal process* and *Y* the *observation process*. The function *<sup>h</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup>*<sup>m</sup>* is often called the *sensor function*, or *link function*, because it models the possibly nonlinear connection of the signal and observation processes. Further, consider the *observation filtration (*Y*t)t*≥<sup>0</sup> given as

$$\mathcal{Y}\_l = \sigma(Y\_s, s \in [0, t]) \vee \mathcal{N} \quad \text{and} \quad \mathcal{Y} = \sigma\left(\bigcup\_{l \in [0, \infty)} \mathcal{Y}\_l\right),$$

where N are the P-nullsets of F. The aim of nonlinear filtering is to compute the probability measure valued *(*Y*t)t*≥0-adapted stochastic process *π* that is defined by the requirement that for all bounded measurable test functions *<sup>ϕ</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup> and *t* ∈ [0*,*∞*)* we have P-a.s. that

$$
\pi\_I \varphi = \mathbb{E} \left[ \varphi(X\_I) \, | \, \mathcal{Y}\_I \right] \, .
$$

We call *π* the *filter*.

Furthermore, let the process *Z* be defined such that for all *t* ∈ [0*,*∞*)*,

$$Z\_t = \exp\{-\int\_0^t h(X\_s) \, \mathrm{d}W\_s - \frac{1}{2} \int\_0^t h(X\_s)^2 \, \mathrm{d}s\}.$$

Then, assumimg that

$$\mathbb{E}\left[\int\_0^t h(X\_s)^2 \,\mathrm{d}s\right] < \infty \quad \text{and} \quad \mathbb{E}\left[\int\_0^t Z\_s h(X\_s)^2 \,\mathrm{d}s\right] < \infty,$$

we have that *Z* is an *(*F*t)t*≥0-martingale and by the change of measure (for details, see [1]) given by dP˜*<sup>t</sup>* dP F*t* = *Zt* , *t* ≥ 0, the processes *X* and *Y* are independent under P and ˜ *<sup>Y</sup>* is a P-Brownian motion. Here, ˜ P is the consistent measure defined on ˜ <sup>3</sup> *<sup>t</sup>*∈[0*,*∞*)* <sup>F</sup>*<sup>t</sup>* . Finally, under P, we can define the measure valued stochastic process ˜ *<sup>ρ</sup>* by the requirement that for all bounded measurable functions *<sup>ϕ</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup> and *t* ∈ [0*,*∞*)* we have P-a.s. that

$$\rho\_l \varphi = \mathbb{E}\left[\varphi(X\_l) \exp\{\int\_0^l h(X\_s) \, \mathrm{d}Y\_s - \frac{1}{2} \int\_0^l h(X\_s)^2 \, \mathrm{d}s\} \Big| \, \mathcal{Y}\_l\right]. \tag{3}$$

The Kallianpur–Striebel formula (see [1]) justifies the terminology to call *ρ* the *unnormalised filter*.

#### *1.2 Filtering Equation and General Splitting Method*

Note that under the conditions given in [4], *X* admits the infinitesimal generator *<sup>A</sup>* : <sup>D</sup>*(A)* <sup>→</sup> *B(*R*<sup>d</sup> )* given, for all *<sup>ϕ</sup>* <sup>∈</sup> <sup>D</sup>*(A)*, by

Deep Learning for the Benes Filter 199

$$A\varphi = \langle f, \nabla \varphi \rangle + \text{Tr}(a \operatorname{Hess} \varphi), \tag{4}$$

where <sup>D</sup>*(A)* denotes the domain of the differential operator *<sup>A</sup>* and *<sup>a</sup>* <sup>=</sup> <sup>1</sup> 2*σ σ* . The symbol *B(*R*<sup>d</sup> )* denotes the set of real-valued, bounded, Borel-measurable functions defined on R*<sup>d</sup>* .

It is well-known (see, e.g., [1]), that the unnormalised filter *ρ* satisfies the *filtering equation*, i.e. for all *t* ≥ 0, we have P-a.s. that ˜

$$
\rho\_l(\varphi) = \pi\_0(\varphi) + \int\_0^t \rho\_s(A\varphi) \, \mathrm{d}s + \int\_0^t \rho\_s(\varphi h') \, \mathrm{d}Y\_s. \tag{5}
$$

The classical splitting method for the filtering equation is given in [3] and seeks to approximate the following SPDE for the density *pt* of the unnormalised filter given, for all *<sup>t</sup>* <sup>≥</sup> 0, *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* , and P-a.s. as

$$p\_l(\mathbf{x}) = p\_0(\mathbf{x}) + \int\_0^t A^\* p\_s(\mathbf{x}) \, \mathrm{d}s + \int\_0^t h'(\mathbf{x}) p\_s(\mathbf{x}) \, \mathrm{d}Y\_s$$

and relies on the splitting-up algorithm described in [9] and [10]. Here *A*∗ is the formal adjoint of the infinitesimal generator *A* of the signal process *X*.

We summarise the splitting-up method below in Note 1.

*Note 1* The splitting method for the filtering problem is defined by iterating the steps below with initial density *<sup>p</sup>*0*(*·*)* <sup>=</sup> *<sup>p</sup>*0*(*·*)*:

1. *(Prediction)* Compute an approximation *<sup>p</sup>*˜*<sup>n</sup>* of the solution to

$$\begin{aligned} \frac{\partial q^n}{\partial t}(t, z) &= A^\* q^n(t, z), \quad (t, z) \in (t\_{n-1}, t\_n] \times \mathbb{R}^d, \\ q^n(0, z) &= p^{n-1}(z), \qquad z \in \mathbb{R}^d, \end{aligned} \tag{6}$$

at time *tn* and

2. *(Normalisation)* Compute the normalisation constant with *zn* = *(Ytn* − *Ytn*−<sup>1</sup> *)/(tn* − *tn*−1*)* and the function

$$\mathbb{R}^d \ni z \mapsto \xi\_n(z) = \exp\left(-\frac{t\_n - t\_{n-1}}{2} ||z\_n - h(z)||^2\right),$$

so that we can set,

$$p^n(z) = \frac{1}{C\_n} \xi\_n(z) \tilde{p}^n(z); \ z \in \mathbb{R}^d, z$$

where *Cn* <sup>=</sup> ! <sup>R</sup>*<sup>d</sup> ξn(z)p*˜*n(z)* <sup>d</sup>*z*. The deep learning method studied below replaces the predictor step of the splitting method above by a deep neural network approximation algorithm to avoid an explicit space discretisation. This is achieved by representing each *<sup>p</sup>*˜*n(z)* by a feed-forward neural network and approximating the initial value problem (6) based on its stochastic representation using a sampling procedure. The normalisation step may then be computed either using quadrature, or, to preserve the mesh-free characteristic, by Monte-Carlo approximation.

#### **2 Derivation and Outline of the Deep Learning Algorithm**

Here, we present a concise version of the derivation laid out in detail in [4].

#### *2.1 Feynman–Kac Representation*

Assuming sufficient differentiability of the coefficient functions, the operator *A*∗ may be expanded such that for all compactly supported smooth test functions *ϕ* ∈ *C*∞ *<sup>c</sup> (*R*<sup>d</sup> ,* <sup>R</sup>*)* we have

$$A^\*\varphi = \text{Tr}(a\,\text{Hess}\,\varphi) + \langle 2\overrightarrow{\text{div}}(a) - f, \text{grad}\,\varphi \rangle + \text{div}(\overrightarrow{\text{div}}(a) - f)\varphi. \tag{7}$$

Subtracting the zero-order term from (7), we obtain an operator that generates the auxiliary diffusion process, denoted *X*ˆ , which is instrumental in the deep learning method.

**Definition 1** Define the partial differential operator *A*ˆ : *C*<sup>∞</sup> *<sup>c</sup> (*R*<sup>d</sup> ,* <sup>R</sup>*)* <sup>→</sup> *Cb(*R*<sup>d</sup> ,* <sup>R</sup>*)*, with image in the set of bounded continuous function on <sup>R</sup>*<sup>d</sup>* , such that for all *<sup>ϕ</sup>* <sup>∈</sup> *C*∞ *<sup>c</sup> (*R*<sup>d</sup> ,* <sup>R</sup>*)*,

$$\hat{A}\varphi = \text{Tr}(a\,\text{Hess}\,\varphi) + \langle 2\overrightarrow{\text{div}}(a) - f,\text{grad}\,\varphi \rangle$$

and the function *<sup>r</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup> such that for all *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* ,

$$r(\mathbf{x}) = \operatorname{div}(\overrightarrow{\operatorname{div}}(a) - f)(\mathbf{x}).$$

**Lemma 1** *For all <sup>x</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup> the operator <sup>A</sup>*<sup>ˆ</sup> *defined in Definition <sup>1</sup> is the infinitesimal generator of the Itô diffusion <sup>X</sup>*<sup>ˆ</sup> : [0*,*∞*)* <sup>×</sup> *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup> given, for all <sup>t</sup>* <sup>≥</sup> <sup>0</sup> *and* P*-a.s. by*

$$
\hat{X}\_t = x + \int\_0^t b(\hat{X}\_s) \mathrm{d}s + \int\_0^t \sigma(\hat{X}\_s) \mathrm{d}\hat{W}\_s,
$$

*where <sup>W</sup>*<sup>ˆ</sup> : [0*,*∞*)* <sup>×</sup> *<sup>Ω</sup>* <sup>→</sup> <sup>R</sup>*<sup>d</sup> is a <sup>d</sup>-dimensional Brownian motion and <sup>b</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> R*<sup>d</sup> is the function*

$$b = 2\overrightarrow{\text{div}}(a) - f.$$

From the well-known Feynman–Kac formula (see Karatzas and Shreve [6, Chapter 5, Theorem 7.6]) we can deduce the Corollary 1 below for the initial value problem.

**Corollary 1** *Let <sup>d</sup>* <sup>∈</sup> <sup>N</sup>*, T >* <sup>0</sup>*, let <sup>k</sup>* : <sup>R</sup>*<sup>d</sup>* → [0*,*∞*) be a continuous function, let <sup>A</sup>*<sup>ˆ</sup> *be the operator defined in Definition 1, and let <sup>ψ</sup>* : <sup>R</sup>*<sup>d</sup>* <sup>→</sup> <sup>R</sup> *be an at most polynomially growing function. Suppose that <sup>u</sup>* <sup>∈</sup> *<sup>C</sup>*1*,*<sup>2</sup> *<sup>b</sup> ((*0*, T* ] × <sup>R</sup>*<sup>d</sup> ,* <sup>R</sup>*) is continuously differentiable with bounded derivative in time and twice continuously differentiable with bounded derivatives in space, and satisfies the Cauchy problem*

$$\begin{aligned} \frac{\partial \boldsymbol{u}}{\partial t}(t, \mathbf{x}) + k(\mathbf{x})\boldsymbol{u}(t, \mathbf{x}) &= \hat{A}\boldsymbol{u}(t, \mathbf{x}), \quad (t, \mathbf{x}) \in (0, T] \times \mathbb{R}^d, \\ \boldsymbol{u}(0, \mathbf{x}) &= \boldsymbol{\psi}(\mathbf{x}), \qquad \mathbf{x} \in \mathbb{R}^d. \end{aligned} \tag{8}$$

*Then, for all (t, x)* <sup>∈</sup> *(*0*, T* ] × <sup>R</sup>*<sup>d</sup> , we have that*

$$\mu(t, \boldsymbol{x}) = \mathbb{E}\left[\left.\psi(\hat{X}\_I)\exp\left(-\int\_0^t k(\hat{X}\_\mathbf{r})\,\mathrm{d}\tau\right)\right|\hat{X}\_0 = \boldsymbol{x}\right],$$

*where X*ˆ *is the diffusion generated by A*ˆ*.*

Recall that our aim is to approximate the Fokker–Planck equation (6). Assume from now on the discrete times {*t*<sup>0</sup> = 0*, t*1*, t*<sup>2</sup> *...*}, indexed by *n*. Written in the form as in Corollary 1, for any timestep *n* = 1*,* 2*,...* , (6) reads as

$$\begin{aligned} \frac{\partial q^n}{\partial t}(t, z) &= \hat{A}q^n(t, z) + r(z)q^n(t, z), & (t, z) &\in (t\_{n-1}, t\_n] \times \mathbb{R}^d, \\ q^n(0, z) &= p^{n-1}(z), & z &\in \mathbb{R}^d. \end{aligned}$$

Thus, with *k* = −*r*, and assuming that −*r* is non-negative in (8), we obtain by Corollary <sup>1</sup> the representation, for all *<sup>n</sup>* ∈ {1*,...,N*}, *<sup>t</sup>* <sup>∈</sup> *(tn*−<sup>1</sup>*, tn*], *<sup>z</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* ,

$$\log^n(t, z) = \mathbb{E}\left[\left.p^{n-1}(\hat{X}\_I)\exp\left(\int\_{I\_{n-1}}^I r(\hat{X}\_\mathbf{r})\,\mathrm{d}\tau\right)\right|\hat{X}\_{I\_{n-1}} = z\right].\tag{9}$$

Note that [4, Proposition 2.4] shows that we have a feasible minimisation problem to approximate by the learning algorithm (see also [2, Proposition 2.7]).

#### *2.2 The Benes Filtering Model*

The Benes filter is a one-dimensional nonlinear model and is used as a benchmark in the numerical studies below. As we show below, it is one of the rare cases of explicitly solvable continuous-time stochastic filtering models. Here, we are considering a special case of the more general class of Benes filters, presented, for example, in [1, Chapter 6.1].

The signal is given by the coefficient functions

$$f(\mathbf{x}) = \alpha \sigma \tanh(\beta + \alpha \mathbf{x}/\sigma) \text{ and } \sigma(\mathbf{x}) \equiv \sigma \in \mathbb{R},$$

where *α, β* <sup>∈</sup> <sup>R</sup> and the observation is given by the affine-linear sensor function

$$h(x) = h\_1 x + h\_2,$$

with *<sup>h</sup>*1*, h*<sup>2</sup> <sup>∈</sup> <sup>R</sup>. The density *pB* of the filter solving the Benes model is then given by two weighted Gaussians (see [1, Chapter 6.1]) as

$$p\_B(z) = w^+ \Phi(\mu\_l^+, \nu\_l)(z) + w^- \Phi(\mu\_l^-, \nu\_l)(z),\tag{10}$$

where *μ*± *<sup>t</sup>* = *M*<sup>±</sup> *<sup>t</sup> /(*2*vt)*, *νt* = 1*/(*2*vt)*, and

$$w^{\pm} = \frac{\exp((M\_l^{\pm})^2/(4v\_l))}{\exp((M\_l^{+})^2/(4v\_l))\exp((M\_l^{-})^2/(4v\_l))}$$

with

$$M\_{l}^{\pm} = \pm \frac{\alpha}{\sigma} + h\_{1} \int\_{0}^{t} \frac{\sinh(s\xi \sigma)}{\sinh(t\xi \sigma)} \mathrm{d}Y\_{s} + \frac{h\_{2} + h\_{1} \ge\_{0}}{\sigma \sinh(t\xi \sigma)} - \frac{h\_{2}}{\sigma} \coth(t\xi \sigma),$$

*vt* = *h*<sup>1</sup> coth*(tζ σ )/*2*σ*, and *ζ* = \* *<sup>α</sup>*2*/σ*<sup>2</sup> <sup>+</sup> *<sup>h</sup>*<sup>2</sup> 1.

Further, for the Benes model, the auxiliary diffusion is given as

$$
\hat{X}\_l = \hat{X}\_0 - \int\_0^t \alpha \sigma \tanh(\beta + \alpha \chi/\sigma) \, \mathrm{d}s + \int\_0^t \sigma \, \mathrm{d}\hat{W}\_s,
$$

and the coefficient

$$r(\mathbf{x}) = -\operatorname{div} f(\mathbf{x}) = -\alpha^2 \operatorname{sech}^2(\beta + \alpha \mathbf{x}/\sigma).$$

Therefore the representation of the solution to the Fokker–Planck equation (6) in the Benes case reads

$$q^n(t, z) = \mathbb{E}\left[\left.p^{n-1}(\hat{X}\_l)\exp\left(-\int\_{t\_{n-1}}^t \alpha^2 \mathrm{sech}^2(\beta + \alpha \hat{X}\_l/\sigma) \,d\tau\right)\right|\hat{X}\_{l\_{n-1}} = z\right].$$

#### *2.3 Neural Network Model for the Prediction Step*

To solve the Fokker–Planck equation over a rectangular domain *Ωd* = [*α*1*, β*1] × ···×[*αd , βd* ], we employ the sampling based deep learning method from [2]. Using the representation (9), the solution of the Fokker–Planck equation is reformulated into an optimisation problem over function space given in [4, Proposition 2.4]. This in turn yields the loss functions for the learning algorithm. Writing Xˆ *<sup>ξ</sup>* for the auxiliary diffusion with Unif*(Ωd )*-random initial value *ξ* , the optimisation problem is approximated by the optimisation

$$\inf\_{\boldsymbol{\theta}\_{\boldsymbol{\theta} \in \mathbb{R}} \boldsymbol{\Sigma}\_{\boldsymbol{i} = 2}^{L} l\_{\boldsymbol{i}} - l\_{\boldsymbol{i}} + l\_{\boldsymbol{i}}} \mathbb{E} \left[ \left| \boldsymbol{\psi} (\hat{\mathbf{X}}\_{T}^{\boldsymbol{\xi}}) \exp \left( - \int\_{0}^{T} k (\hat{\mathbf{X}}\_{\boldsymbol{t}}^{\boldsymbol{\xi}}) \, \mathrm{d}\tau \right) - \mathcal{N} \mathcal{N}\_{\boldsymbol{\theta}} (\boldsymbol{\xi}) \right|^{2} \right]$$

where the solution of the PDE is represented by a neural network N N*<sup>θ</sup>* and the infinite-dimensional function space has been parametrised by *θ*. Here, *L* denotes the depth of the neural net, and the parameters *li* are the respective layer widths. Further details can be found in [4]. A comprehensive textbook on deep learning is [5]. We apply a modified gradient descent method, called ADAM [7], to determine the parameters in the model by minimising the *loss function*

$$\begin{aligned} \mathcal{L}(\theta; \{\hat{\xi}^{\dot{l}}, \{\hat{\mathbf{X}}\_{\overline{\mathbf{r}}\_{\dot{l}}}^{\dot{\xi}, \dot{l}}\}\_{\dot{l}=0}^{J}) &= \\ \frac{1}{N\_b} \sum\_{i=1}^{N\_b} \left| \psi(\hat{\mathbf{X}}\_T^{\dot{\xi}, \dot{l}}) \exp(-\sum\_{j=0}^{J-1} k(\hat{\mathbf{X}}\_{\overline{\mathbf{r}}\_j}^{\dot{\xi}, \dot{l}})(\tau\_{j+1} - \tau\_j)) - \mathcal{N} \mathcal{N}\_\theta(\xi^{\dot{l}}) \right|^2, \end{aligned}$$

where *Nb* is the batch size and {*<sup>ξ</sup> <sup>i</sup> ,*{X<sup>ˆ</sup> *ξ,i τj* } *J <sup>j</sup>*=0} *Nb <sup>i</sup>*=<sup>1</sup> is a training batch of independent identically distributed realisations *<sup>ξ</sup> <sup>i</sup>* of *<sup>ξ</sup>* <sup>∼</sup> <sup>U</sup>*(Ωd )* and {X<sup>ˆ</sup> *ξ,i τj* } *J <sup>j</sup>*=<sup>0</sup> the approximate i.i.d. realisations of sample paths of the auxiliary diffusion started at *ξ <sup>i</sup>* over the time-grid *τ*<sup>0</sup> = 0 *< τ*<sup>1</sup> *<* ··· *< τJ*−<sup>1</sup> *< τJ* = *T* . For the approximation of the sample paths of the diffusion we use the Euler–Maruyama method [8]. Additionally, we augment the loss L by an additional term to encourage the positivity of the neural network. Thus, in practice, we use the loss

$$\tilde{\mathcal{L}}(\theta; \{\xi^{l}, \{\hat{\mathbf{X}}\_{\mathbf{r}\_{l}}^{l}\}\_{l=0}^{J}\}\_{l=1}^{N\_{b}}) = \mathcal{L}(\theta; \{\xi^{l}, \{\hat{\mathbf{X}}\_{\mathbf{r}\_{l}}^{l}\}\_{l=0}^{J}\}\_{l=1}^{N\_{b}}) + \lambda \sum\_{l=1}^{N\_{b}} \max\{0, -\mathcal{N}\mathcal{N}\_{\theta}(\xi^{l})\}\_{l=1}^{N\_{b}}$$

with the hyperparameter *λ* to be chosen.

Thus, in the notation of Sect. 1.2 we replace the Fokker–Planck solution by a neural network model, i.e. we *postulate* a neural network model

$$
\tilde{p}\_n(z) = \mathcal{N}\mathcal{N}(z),
$$

with support on *Ωd* . Therefore we require the a priori chosen domain to capture most of the mass of the probability distribution it is approximating.

#### *2.4 Monte-Carlo Normalisation Step*

We then realise the normalisation step via Monte-Carlo sampling over the bounded rectangular domain *Ωd* to approximate the integral

$$\int\_{\mathbb{R}^d} \xi\_n(z) \mathcal{N} \mathcal{N}(z) \, \mathrm{d}z = \int\_{\mathcal{Q}\_d} \exp\left(-\frac{t\_n - t\_{n-1}}{2} ||z\_n - h(z)||^2\right) \mathcal{N} \mathcal{N}(z) \, \mathrm{d}z,\qquad(11)$$

where, as defined earlier, *zn* <sup>=</sup> <sup>1</sup> *tn*−*tn*−<sup>1</sup> *(Ytn* − *Ytn*−<sup>1</sup> *)*. Note that, since *Ωd* is the support of the neural network N N , the right-hand side above is indeed identical to the integral over the whole space.

The sensor function in the Benes model is given by *h(x)* = *h*1*x* + *h*2. Then, the likelihood function becomes

$$\xi\_n(z) = \frac{\sqrt{2\pi}}{\sqrt{(t\_n - t\_{n-1})h\_1^2}} \mathcal{N}\_{\mathrm{pdf}}\left(\frac{z\_n - h\_2}{h\_1}, \frac{1}{(t\_n - t\_{n-1})h\_1^2}\right)(z),$$

where <sup>N</sup>pdf*(μ, σ*2*)* denotes the probability density function of a normal distribution with mean *μ* and variance *σ*2. Therefore, we can write the integral (11) as

$$\frac{\sqrt{2\pi}}{\sqrt{(t\_n - t\_{n-1})h\_1^2}} \mathbb{E}\_Z[\mathcal{N}\mathcal{N}(Z)]; \qquad Z \sim \mathcal{N}\left(\frac{z\_n - h\_2}{h\_1}, \frac{1}{(t\_n - t\_{n-1})h\_1^2}\right).$$

This is an implementable method to compute the normalisation constant *Cn*. Thus, we can express the approximate posterior density as

$$p^n(z) = \frac{1}{C\_n} \xi\_n(z) \tilde{p}^n(z).$$

Therefore, the methodology is fully recursive and can be applied sequentially.

*Remark 1* In low-dimensions, the usage of the Monte-Carlo method to perform the normalisation is optional, since efficient quadrature methods are an alternative. We chose the sampling based method to preserve the grid-free nature of the algorithm.

#### **3 Numerical Results for the Benes Filter**

The neural network architecture for all our experiments below is a feed-forward fully connected neural network with a one-dimensional input layer, two hidden layers with a layer width of 51 neurons each and batch-normalisation, and an output layer of dimension one (a detailed illustration can be found in [4]). For the optimisation algorithm we chose the ADAM optimiser and performed the training over 6002 epochs with a batch size of 600 samples. The initial signal and observation values are *x*<sup>0</sup> = *y*<sup>0</sup> = 0 and the coefficients of the Benes model were chosen as *α* = 3, *β* = 0, *σ* = 0*.*5, *h*<sup>1</sup> = 3, *h*<sup>2</sup> = 0, and timestep *Δt* = 0*.*1 over *N* = 40 steps. The initial condition is a Gaussian density with mean 0 and standard deviation 0*.*001. The posterior was calculated over the domain [−9*,* 2*.*5]. The domain boundaries were pre-estimated using a simulation of the exact Benes filter with fixed random seed. In the case of the domain adaptation we used the precomputed evolutions from the true solution to estimate the support of the posterior and set a fixed domain adaptation schedule. The spatial resolution is 1000 uniformly spaced values in the domain of definition of the neural network. At each time step, the training of the network consumes 6002·600 = 3*,*601*,*200 Monte-Carlo samples. Additionally we employ a piecewise constant learning rate schedule *lr(epoch)* <sup>=</sup> <sup>10</sup>−*(*2+*epoch* mod 2001*)* and the normalisation constant is computed using 107 samples each timestep. The regularising parameter *<sup>λ</sup>* <sup>=</sup> 1.

#### *3.1 No Domain Adaptation*

Figure 1 shows the plots for the Benes filter without domain adaptation. In Fig. 1a we observe the drift of the posterior toward the left edge of the domain. The initial bimodality, reflecting the uncertainty due to few observed values, quickly resolves and the approximate posterior tracks the signal within the domain. In Fig. 1b the bimodality is mostly visible in the Monte-Carlo prior and smoothed out by the neural network. Figure 1c and d show snapshots of the progression of the filter. The absolute error in means with respect to the Benes reference solution is plotted in Fig. 2a and shows that as the posterior reaches the left domain boundary, the error increases. This is reflected as well in the drop of probability mass, Fig. 2c, and Monte-Carlo acceptance rate, Fig. 2d at later times. It is not clear from Fig. 2a if there is a trend in the error. Further experiments need to be performed to check this hypothesis. Figure 2b shows that the neural net training consistently succeeds as measured by the *L*<sup>2</sup> distance between the Monte-Carlo reference prior and the neural net prior.

**Fig. 1** Results of the combined splitting-up/machine-learning approximation applied iteratively to the Benes filtering problem (no domain adaptation). (**a**) The full evolution of the estimated posterior distribution produced by our method, plotted at all intermediate timesteps. (**b**–**d**) Snapshots of the approximation at times, *t* = 0*.*6, *t* = 1*.*8, and *t* = 3*.*9. The black dotted line in each graph shows the estimated posterior, the yellow line the prior estimate represented by the neural network, and the light-blue shaded line shows the Monte-Carlo reference solution for the prior

**Fig. 2** Error and diagnostics for the Benes filter (no domain adaptation). (**a**) Absolute error in means between the approximated distribution and the exact solution. (**b**) *L*<sup>2</sup> error of the neural network during training with respect to the Monte-Carlo reference solution. (**c**) Probability mass of the neural network prior. (**d**) Monte-Carlo acceptance rate

#### *3.2 With Domain Adaptation*

Figure 3 shows the plots for the Benes filter with domain adaptation. In Fig. 3a we observe again the drift of the posterior toward the left edge of the domain. and the initial bimodality resolves. The approximate posterior tracks the signal within the domain. In Fig. 3b the bimodality is visible both in the prior an the posterior network. This shows that the domain adaptation helps resolve the bimodality in the nonlinear case by increasing the spatial resolution while keeping the computational cost equal. Figure 3c and d again show snapshots of the progression of the filter. The absolute error in means with respect to the Benes reference solution is plotted in Fig. 4a and shows a clear linear trend. This is an interesting phenomenon, likely due to the reduced domain size and subsequent error accumulation. The probability mass, Fig. 4c, and Monte-Carlo acceptance rate, Fig. 4d are stably fluctuating. Figure 4b shows here again that the neural net training consistently succeeds.

**Fig. 3** Results of the combined splitting-up/machine-learning approximation applied iteratively to the Benes filtering problem (with domain adaptation). (**a**) The full evolution of the estimated posterior distribution produced by our method, plotted at all intermediate timesteps. (**b**–**d**) Snapshots of the approximation at times, *t* = 0*.*6, *t* = 1*.*8, and *t* = 3*.*9. The black dotted line in each graph shows the estimated posterior, the yellow line the prior estimate represented by the neural network, and the light-blue shaded line shows the Monte-Carlo reference solution for the prior

**Fig. 4** Error and diagnostics for the Benes filter (with domain adaptation). (**a**) Absolute error in means between the approximated distribution and the exact solution. (**b**) *L*<sup>2</sup> error of the neural network during training with respect to the Monte-Carlo reference solution. (**c**) Probability mass of the neural network prior. (**d**) Monte-Carlo acceptance rate

#### **4 Conclusion and Outlook**

We have studied the domain adaptation in our method from [4] on the example of the Benes filter. We observed that the domain adapted method was more effective in resolving the bimodality than the non-domain adapted one. However, this came at the cost of a linear trend in the error. A possible direction for future work would thus be to investigate the optimal domain size more closely, in order to mitigate the error trend, and make full use of the increased resolution from the domain adaptation. This is subject of future research in connection with more general domain adaptation methods than the one employed here, which is specific to the Benes filter.

As already noted in the previous work [4], the possibility for *transfer learning* in our method should be explored.

A long-term goal in the development of neural network based numerical methods must of course be the rigorous error analysis, which remains a challenging task.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **End-to-End Kalman Filter in a High Dimensional Linear Embedding of the Observations**

**Said Ouala, Pierre Tandeo, Bertrand Chapron, Fabrice Collard, and Ronan Fablet**

**Abstract** Data assimilation techniques are the state-of-the-art approaches in the reconstruction of a spatio-temporal geophysical state such as the atmosphere or the ocean. These methods rely on a numerical model that fills the spatial and temporal gaps in the observational network. Unfortunately, limitations regarding the uncertainty of the state estimate may arise when considering the restriction of the data assimilation problems to a small subset of observations, as encountered for instance in ocean surface reconstruction. These limitations motivated the exploration of reconstruction techniques that do not rely on numerical models. In this context, the increasing availability of geophysical observations and model simulations motivates the exploitation of machine learning tools to tackle the reconstruction of ocean surface variables. In this work, we formulate sea surface spatio-temporal reconstruction problems as state space Bayesian smoothing problems with unknown augmented linear dynamics. The solution of the smoothing problem, given by the Kalman smoother, is written in a differentiable framework which allows, given some training data, to optimize the parameters of the state space model.

**Keywords** Kalman filter · Machine learning · Spatio-temporal interpolation

#### **1 Introduction**

Data assimilation in a broad sense can be considered as the inference of a hidden state, based on several sources of information. When considering data assimilation

B. Chapron Ifremer, LOPS, Plouzané, France

F. Collard ODL, Locmaria-Plouzané, France

S. Ouala (-) · P. Tandeo · R. Fablet IMT Atlantique, Lab-STICC, Brest, France

e-mail: said.ouala@imt-atlantique.fr

in the context of oceanography, these schemes exploit, in addition to some given observations, a dynamical model to perform simulations from given ocean states [1]. Unfortunately, realistic analytic parameterizations of the dynamical model, in the context of sea surface variables reconstruction, lead to computationally demanding representations [2]. Furthermore, when associated to a small subset of observations (as encountered for instance when assimilating sea surface variables with a global ocean model), these realistic models may result in modeling and inversion uncertainties. On the other hand, the analytic derivation of computationally-efficient, low-order models involves theoretical assumptions, which may not be fulfilled by real observations. These limitations motivated the exploration of interpolation techniques that do not require an explicit dynamical representation. Among other methods, Optimal Interpolation (OI) became the state-of-the-art framework [3, 4]. This technique does not need an explicit formulation of the dynamical model and rather relies on the modelization of the covariance of the spatio-temporal fields. Despite the success of OI, this technique tends to smooth the fine scale structures which motivates the development of new spatio-temporal interpolation schemes, mainly based on machine learning representations [5–10].

From the perspective of the machine learning community, state-of-the-art reconstruction techniques are usually formulated as inverse problems, where one searches to maximize the reconstruction performance of an inversion model, given the observed field as an input. Several methods were developed for this purpose in the fields of signal denoising [11, 12] and image inpainting [13] where the inversion model typically relies on a deep learning architecture. This *end-to-end* learning strategy, differs from classical inversion techniques used in geosciences, where the state-space representations (specifically the dynamical models) and the inversion schemes are a priori unrelated. The recent exploration of machine learning representations in the context of sea surface fields reconstruction was inspired by the latter methodological viewpoint, where a data-driven dynamical model is optimized based on the minimization of a forecasting cost. This data-driven prior is then plugged into a data assimilation framework to perform reconstruction based on classical (Kalman based, variational formulations) inversion schemes [7, 14, 8].

Recently, several works investigated *end-to-end* deep learning architectures in the resolution of reconstruction issues in geosciences [15–17, 10]. However, this tools, although relevant, were naturally explored in the context of image denoising and inpainting applications due to the lack of methodological formulation. When considering geosciences applications, a huge effort was carried within the geosciences community to derive reconstruction algorithms that, beyond being efficient with respect to a given metric, are robust and rely on a solid methodological formulation. From this point of view, we believe that *end-to-end* deep learning techniques should build on such methodological knowledge to propose new reconstruction solutions that can achieve both a decent performance score, and remain theoretically relevant which helps the understanding and generalization of these algorithms. From this point of view, we exploit ideas from machine learning and Bayesian filtering to propose a framework that is able to provide a relevant reconstruction of a spatiotemporal state. Specifically, we formulate a new state space model for ocean surface observations based on an augmented linear dynamical system. Assuming that the model and observation errors are Gaussian, the solution of the filtering/smoothing problem on this new state space model is given by the Kalman filter/smoother. Inspired by deep learning architectures, the Kalman recursion is written in a differentiable framework, which allows for the derivation of the parameters of the new state-space model based on a reconstruction cost of the observations.

#### **2 Method**

**Motivation** Let us assume the following state-space model

$$
\dot{\mathbf{x}}\_l = f(\mathbf{x}\_l) + \boldsymbol{\eta}\_l \tag{1}
$$

$$\mathbf{y}\_l = \mathcal{H}\_l(\mathbf{x}\_l) + \mathbf{e}\_l \tag{2}$$

where *<sup>t</sup>* ∈ [0*,* +∞] is time. The variables **<sup>x</sup>***<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*<sup>s</sup>* and **<sup>y</sup>***<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* represent the state variables and the observations respectively. *f* and H*<sup>t</sup>* are the dynamical and observation operators. *η<sup>t</sup>* and *<sup>t</sup>* are random processes accounting for the uncertainties. They are defined as centered Gaussian processes with covariances **Q***<sup>t</sup>* and **R***<sup>t</sup>* respectively.

In the context of geosciences, and when considering the resolution of filtering and smoothing problems using data assimilation, the dynamical and observation models *f* and H, the model and observation error covariances **Q***<sup>t</sup>* and **R***<sup>t</sup>* as well as the true state **x***<sup>t</sup>* of Eqs. (1) and (2) are either unavailable or too complicated to handle. In this context, we show in this work how to exploit observations **y***<sup>t</sup>* sampled from time *t*<sup>1</sup> to time *tf* to learn a Bayesian scheme that allows for reconstruction applications given new observations (i.e., at time *t>tf* ).

**Definition of a New State Space Model** In this work, we consider an embedding of the observations as proposed in [18]. Specifically, we project our observations (or a reduced order version of our observations) into a higher dimensional space where the dynamics of the observations are assumed to be linear. Formally, in order to derive our new state-space model, we first start by writing an augmented state **u***<sup>t</sup>* such as **u***<sup>t</sup> <sup>T</sup>* = [*(***My***t)<sup>T</sup> ,* **<sup>z</sup>***<sup>T</sup> <sup>t</sup>* ] with **<sup>z</sup>***<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*<sup>l</sup>* is the unobserved component of the augmented state **<sup>u</sup>***<sup>t</sup>* and **<sup>M</sup>** <sup>∈</sup> <sup>R</sup>*r*×*<sup>n</sup>* with *<sup>r</sup>* <sup>≤</sup> *<sup>n</sup>* a linear projection operator (that can be used for instance in the context of reduced order modeling). The matrix **M** is assumed to have *<sup>r</sup>* orthogonal lines so that the matrix **<sup>M</sup>**−<sup>1</sup> <sup>=</sup> **<sup>M</sup>***<sup>T</sup>* verifies **MM**−<sup>1</sup> <sup>=</sup> **I**. We used in this work an Empirical Orthogonal Functions (EOF) projection. This constraints **M** to be a matrix of orthogonal eigenvectors of the covariance matrix of the centered data. The augmented state **<sup>u</sup>***<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*dE* , with *dE* <sup>=</sup> *<sup>l</sup>* <sup>+</sup> *<sup>r</sup>*, evolves in time according to the following state-space model:

$$
\dot{\mathbf{u}}\_l = \mathbf{A}\_\sigma \mathbf{u}\_l + \boldsymbol{\eta}\_l \tag{3}
$$

$$\mathbf{y}\_l = \mathbf{M}^{-l} \mathbf{G} \mathbf{u}\_l + \mathbf{e}\_l \tag{4}$$

where the dynamical operator **A***<sup>σ</sup>* is a *dE* × *dE* matrix with coefficients *σ*. **G** is a projection matrix that satisfies **My***<sup>t</sup>* = **Gu***<sup>t</sup>* . The eigenvalues of the matrix **A***<sup>σ</sup>* encode the decaying and oscillating modes of the dynamics that are learned from data. Furthermore, the matrix **A***<sup>σ</sup>* can be constrained to be skew-symmetric (simply by imposing **<sup>A</sup>***<sup>σ</sup>* <sup>=</sup> <sup>0</sup>*.*5*(***B***<sup>σ</sup>* <sup>−</sup> **<sup>B</sup>***<sup>T</sup> <sup>σ</sup> )* with **B***<sup>σ</sup>* a trainable matrix) so the solution of (3) will be written as a weighted sum of *dE/*2 trainable oscillations, where the corresponding frequencies been encoded in the imaginary parts of the eigenvalues of **A***<sup>σ</sup>* . This formulation is highly suitable for Hamiltonian (conservative) dynamical systems since the energy of the system is conserved and allows guaranteeing long term boundedness of the model. Furthermore, this formulation differs fundamentally from classical Auto Regressive (AR) models written in the space of the observations. Indeed, simple AR models only have a number of *r<dE* eigenvalues, which limits their expressivity.

It is worth noting that this formulation closely relates to the Koopman operator [19] where the augmented state **u***<sup>t</sup>* can be seen as a finite dimensional approximation of the infinite dimensional Hilbert space of measurements of the hidden state **x***<sup>t</sup>* . This model takes advantage of a linear formulation of the dynamics in a space of observables, where the resulting model is perfectly linear for a category of dynamical regimes (typically periodic and quasi-periodic ones), and can provide a decent short-term approximation of chaotic regimes. It can also be seen as a generalization of the Dynamic Mode Decomposition (DMD) method, in which **u***<sup>t</sup>* = **My***<sup>t</sup>* .

**Model and Observations Error Covariances** The model and observation errors *η<sup>t</sup>* and *<sup>t</sup>* are assumed to follow Gaussian distributions with zero mean and covariance matrices **Q***λ,t* and **R***φ,t* , respectively. These covariance models can be parameterized as neural networks with parameter vectors *λ* and *φ*.

**Smoothing Scheme** A Kalman smoother, based on the above state-space model, is written in a differentiable framework. The idea is to derive an analytical solution of the posterior distribution *p(***u***t*|**y***t*1:*tf )*, based on the Kalman recursion. Formally, given a regular time discretization *t* ∈ [*t*1*,...,tN* ] where *N* is a positive integer and given the initial moments **u***<sup>a</sup> <sup>t</sup>*<sup>1</sup> and **<sup>P</sup>***<sup>a</sup> <sup>t</sup>*<sup>1</sup> , the mean **<sup>u</sup>***<sup>s</sup>* and covariance **<sup>P</sup>***<sup>s</sup>* of the posterior distribution *p(***u***t*|**y***t*1:*tf )* can be computed as follows:

$$\mathbf{u}\_{t+1}^{f} = \mathbf{F} \mathbf{u}\_{t}^{a} \tag{5}$$

$$\mathbf{P}\_{t+1}^{f} = \mathbf{F}\mathbf{P}\_{t}^{a}\mathbf{F}^{T} + \mathbf{Q}\_{\lambda,l} \tag{6}$$

$$\mathbf{K}\_{t+1} = \mathbf{P}\_{t+1}^{f} \mathbf{H}^{T} [\mathbf{H} \mathbf{P}\_{t+1}^{f} (\mathbf{H})^{T} + \mathbf{R}\_{\phi, I}]^{-1} \tag{7}$$

$$\mathbf{u}\_{t+1}^{a} = \mathbf{u}\_{t+1}^{f} + \mathbf{K}\_{t+1} [\mathbf{y}\_{t+1} - \mathbf{H} \mathbf{u}\_{t+1}^{f}] \tag{8}$$

$$\mathbf{P}\_{t+1}^{\mu} = \mathbf{P}\_{t+1}^{f} - \mathbf{K}\_{t+1} \mathbf{H} \mathbf{P}\_{t+1}^{f} \tag{9}$$

$$\mathbf{K}\_{t+1}^{s} = \mathbf{P}\_{t+1}^{a} \mathbf{F}^{T} (\mathbf{P}\_{t+2}^{f})^{-1} \tag{10}$$

$$\mathbf{u}\_{t+1}^{s} = \mathbf{u}\_{t+1}^{a} + \mathbf{K}\_{t+1}^{s} [\mathbf{u}\_{t+1}^{s} - \mathbf{u}\_{t+1}^{f}] \tag{11}$$

$$\mathbf{P}\_{\mathfrak{r}+1}^{s} = \mathbf{P}\_{\mathfrak{r}+1}^{a} - \mathbf{K}\_{\mathfrak{r}+1}^{s} (\mathbf{P}\_{\mathfrak{r}+1}^{f} - \mathbf{P}\_{\mathfrak{r}+2}^{s}) (\mathbf{K}\_{\mathfrak{r}+1}^{s})^{T} \tag{12}$$

where **<sup>F</sup>** <sup>=</sup> *<sup>e</sup>dt***A***<sup>σ</sup>* with *dt* the prediction time step and **<sup>H</sup>** <sup>=</sup> **<sup>M</sup>**−1**G**. The smoothing (Eqs. (10), (11) and (12)) is carried backward in time with **P***<sup>s</sup> tf* <sup>=</sup> **<sup>P</sup>***<sup>a</sup> tf* and **<sup>u</sup>***<sup>s</sup> tf* <sup>=</sup> **<sup>u</sup>***<sup>a</sup> tf* .

**Learning Scheme** The tuning of the trainable parameters vector *θ* = [*σ, λ, φ*] *T* is carried using the following loss function: *<sup>θ</sup>*<sup>ˆ</sup> <sup>=</sup> arg min*<sup>θ</sup>* {*γ*1L<sup>1</sup> <sup>+</sup> *<sup>γ</sup>*2L2} where

<sup>L</sup><sup>1</sup> <sup>=</sup> *tN <sup>t</sup>*=*t*<sup>0</sup> **y***<sup>t</sup>* <sup>−</sup> **Hu***<sup>s</sup> <sup>t</sup>* <sup>2</sup> and <sup>L</sup><sup>2</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> log*(*|**HP***<sup>f</sup> <sup>t</sup>*+1**H***<sup>T</sup>* <sup>+</sup> **<sup>R</sup>***φ,t*|*)* <sup>+</sup> <sup>1</sup> 2 *t*=*tN <sup>t</sup>*=<sup>1</sup> ||**y***<sup>t</sup>* <sup>−</sup> **Hu***<sup>f</sup> <sup>t</sup>* ||<sup>2</sup> **HP***<sup>f</sup> <sup>t</sup>*+1**H***<sup>T</sup>* <sup>+</sup>**R***φ,t* and *γ*<sup>1</sup> and *γ*<sup>2</sup> are weighting parameters. The first term L<sup>1</sup> is simply the quadratic reconstruction error of the observation. The minimization of this error helps to recover an initial guess of the trainable parameters. The second term, L<sup>2</sup> is the negative log likelihood of the observations. This likelihood is derived from the likelihood of the innovation, i.e. D *p(***y**1:*<sup>T</sup> )* <sup>=</sup> *<sup>t</sup>*=*<sup>T</sup> <sup>t</sup>*=<sup>1</sup> *p(***yt**|**yt**−**1***)* [20].

#### **3 Numerical Experiments**

#### *3.1 Preliminary Analysis on SST Anomaly Data*

As an illustration of the proposed framework, we consider scalar measurements of the anomaly of the Sea Surface Temperature (SST) in the Mediterranean Sea (8.6◦N and 43.8◦E). The data are computed based on of the annual 99th percentile of Sea Surface Temperature (SST) from model data [21]. The time series consists of daily measurements of the SST anomaly from 1987 to 2019. The training data is composed of a sparse sampling of the original time series, as highlighted in Fig. 1a. The proposed framework is tested with the following configuration: The augmented state space model is built with **<sup>M</sup>** <sup>=</sup> *<sup>I</sup>*1, and **<sup>z</sup>** <sup>∈</sup> <sup>R</sup>5. The model error covariance is a constant matrix of size, *dE* × *dE* and the observation error covariance is a scalar parameter that corresponds to the variance of the SST anomaly measurement error. Finally, the training is carried with *γ*<sup>1</sup> = 0 and *γ*<sup>2</sup> = 1.

Figure 1b highlights the reconstruction performance of the smoothing Probability Density Function (PDF) with respect to the true (unobserved) state. Interestingly, and despite the fact that the observations used to train the parameters of the Kalman filtering scheme were extremely sparse, the proposed framework is able to catch the correct underlying frequencies. Furthermore, the coverage probability of the PDF highlights the effectiveness of the estimated model and observations error covariances.

#### *3.2 Shallow Water Equation (SWE) Case-Study*

**Dataset Description** We consider the SWE without wind stress and bottom friction. The momentum equations are taken to be linear, and the continuity equation is solved in its nonlinear form. The direct numerical simulation is carried using a finite difference method. The size of the domain is set to 1000 km × 1000 km with a corresponding regular discretization of 80 × 80. The temporal step size was set to satisfy the Courant–Friedrichs–Lewy condition (*h* = 40*.*41 s). The data were subsampled to *h* = 40*.*41 × 10 and 500 time-steps were used as training data. The models were validated on a series of length 100. As observations, we randomly sample 1% of the pixels with a temporal coverage given in Fig. 2.

**Parametrization of the Data-Driven Models** The application of the above framework in the spatio-temporal reconstruction of sea surface fields should be considered with care to account for the underlying dimensionality. In this context, and following several related works [14, 9], a patch based representations is considered in order to reduce the computational complexity of the model. Specifically, this patch based representations allows a block diagonal modelization of the covariance

**Fig. 2** Daily performance time series: we report the reconstruction performance of the sea surface elevation and its gradient in (**a**) and (**b**) respectively

matrices, which significantly reduces the computational and memory complexity of the model. This patch-based representation is fully embedded in the considered architecture to make explicit both the extraction of the patches from a 2D field and the reconstruction of a 2D field from the collection of patches. The latter involves a reconstruction operator F*<sup>r</sup>* which is learned from data.

This patch-level representation is carried with a fixed shape of 35 × 35 pixels and a 10 pixels overlap between neighboring patches, resulting in a total of 16 overlapping patches. For each patch P*i*, *i* = 1*,...,* 16 we learn an EOF basis **M**P*<sup>i</sup>* from the training data. We keep the first 20 EOF components, which amount on average to 95% of the total variance. This patch-based decomposition is shared among all the tested models. The end-to-end Kalman filter architecture (E2EKF) is applied on a patch level with an augmented linear model operating on an embedding of dimension *dE* = 60. The reconstructed patches are combined through the reconstruction model F*r*. This model is implemented as a residual, two blocks, convolutional neural network. The first block of the network contains four layers with 6 filters of size *k* × *k* (with *k* ranging from 3 to 17). The second block involves 5 layers, the first four containing 24 filters and a similar kernel size distribution as the ones in the first block, the last layer is a linear convolution with a single filter.

The proposed technique is compared in this work to the following schemes:

– Data-driven plug-and-play Kalman filter (KF): In order to show the relevance of the proposed end-to-end architecture, its plug-and-play counterpart is also tested. This model exploits the same patch based augmented linear formulation as the end-to-end one, however, the parameters of the dynamical model are trained based on a forecasting criterion and plugged into a Kalman filtering scheme.


**Table 1** Surface elevation (*η*) interpolation experiment: reconstruction correlation coefficient and root mean squared error (RMSE) over the elevation time series and their gradient. Bold values denote smallest RMSE and highest percentage correlation

– Analog data assimilation (AnDA): We apply the analog data assimilation framework [14, 7] with a locally linear dynamical kernel and an ensemble Kalman filter scheme. Please refer to [14, 7] for a detailed description of this data-driven approach, which relies on nearest-neighbor regression techniques.

Following [14], an EOF based post-processing step is applied to all the reconstructions. Furthermore, in this experiment, we only report the reconstruction performance of the mean component as a relevant benchmark of the uncertainty of the above data-driven models would be out of the scope of this paper. Thus, the model and observation error covariances are assumed to be known matrices with appropriate dimensions, and the training of the proposed model is carried with *γ*<sup>1</sup> = 1 and *γ*<sup>2</sup> = 0.

**Reconstructing Performance of the Proposed Data-Driven Models** A quantitative analysis of the benchmark is given in Table 1 based on (i) a mean RMSE criterion and (ii) a mean correlation coefficient criterion of the interpolated fields as well as their gradients. The RMSE and correlation coefficient time series, as well as the spatial coverage of the observations are also reported in Fig. 2. Overall, the proposed end-to-end architecture leads to very significant improvements with respect to the state-of-the-art AnDA technique, as well as to its plug-and-play counterpart both in terms of RMSE and correlation coefficients. These results emphasize the importance of the end-to-end methodology with respect to classical plug-and-play techniques since, when considering data-assimilation applications, and as shown by [16, 10], the reconstruction performance depends, in addition to the quality of the dynamical prior, on the provided measurements and their sampling. Classical plug-and-play techniques, in the opposite to end-to-end strategies, ignore the latter source of information which explains the performance of our framework.

**Qualitative Analysis of the Proposed Schemes** the conclusions of the quantitative analysis are also illustrated through the visual analysis of the reconstructed surface elevation and its gradient in Fig. 3. Interestingly, this visual analysis reveals that the AnDA technique tend to smooth out fine-scale patterns. By contrast, the Kalman filter based schemes (in both its end-to-end and plug and play versions) achieve a better reproduction of fine scale structures, illustrated for instance by the gradients

**Fig. 3** Interpolation example of the surface elevation field: first row, the reference surface elevation, its gradient and the observation with missing data; second row, interpolation results using respectively the plug-and-play Augmented Koopman Kalman filter, AnDA, and the proposed E2EKF; third row, gradient of the reconstructed fields

of the field. The analysis of the spectral signatures in Fig. 4 leads to similar conclusions since, when compared to the state-of-the-art AnDA technique, as well as to its plug and play counterpart, the proposed end-to-end architecture leads to significant improvements especially regarding the reproduction of the gradient energy-level.

#### **4 Conclusion**

Spatio-temporal interpolation applications are important in the context of ocean surface modeling. For this reason, deriving new data assimilation architectures that can perfectly exploit the observations and the current advances in signal processing, modeling and artificial intelligence is crucial. In this context, this work investigated the ability of augmented linear state space models in solving smoothing issues of ocean surface observations using the Kalman filter.

**Fig. 4** Spectral comparison of the tested models: the averaged power spectral densities and their error with respect to the ground truth are given in (**a**) and (**b**) respectively

Beyond filtering and smoothing applications, we believe that the proposed framework provides an initial playground for learning approximate linear state space models of real observations. Given a sequence of sparse observations, the proposed framework may be able to unfold large scale frequencies that are useful for prediction. Interesting case studies include sea level rise and the increase of the anomaly of the sea surface temperature.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Dynamical Properties of Weather Regime Transitions**

**Paul Platzer, Bertrand Chapron, and Pierre Tandeo**

**Abstract** Large-scale weather can often be successfully described using a small amount of patterns. A statistical description of reanalysed pressure fields identifies these recurring patterns with clusters in state-space, also called "regimes". Recently, these weather regimes have been described through instantaneous, local indicators of dimension and persistence, borrowed from dynamical systems theory and extreme value theory. Using similar indicators and going further, we focus here on weather regime transitions. We use 60 years of winter-time sea-level pressure reanalysis data centered on the North-Atlantic ocean and western Europe. These experiments reveal regime-dependent behaviours of dimension and persistence near transitions, although in average one observes an increase of dimension and a decrease of persistence near transitions. The effect of transition on persistence is stronger and lasts longer than on dimension. These findings confirm the relevance of such dynamical indicators for the study of large-scale weather regimes, and reveal their potential to be used for both the understanding and detection of weather regime transitions.

**Keywords** Weather · Regime · Transition · Shift · Dynamical systems · Dimension · Persistence

#### **1 Introduction**

The concept of weather regime was introduced in 1949 by [1]. Broadly speaking, weather regimes are recurring, quasi-stationary states of the atmosphere, which allow to describe most of the subseasonal variability of atmospheric states, the

P. Platzer (-) · B. Chapron

Laboratoire d'Océanographie Physique et Spatiale (LOPS), Ifremer, Plouzané, France e-mail: paul.platzer@ifremer.fr

P. Tandeo Lab-STICC, UMR CNRS 6285, IMT Atlantique, Plouzané, France

latter being defined through large-scale maps of either mean sea-level pressure or geopotential height. The study of weather regimes has numerous potential applications as a tool to understand subseasonal atmospheric dynamics [2]. The understanding and correct representation of weather regimes is also paramount for adequate climate projections [3].

Vautard [4] defines weather regimes through stationarity and searches for geopotential fields with a quasi-vanishing time-derivative. Others (see e.g. [5]) use cluster analysis (i.e. k-means or Gaussian Mixture Models) to find recurring patterns. To perform such analyses, one usually uses a low-order description of the atmospheric state, through empirical orthogonal functions (EOFs). Some authors simply rely on projection on a low number of EOFs (two in the case of [6]), and on forecaster's empirical knowledge of the recurrence of regimes defined through positive and negative phases of dominant EOFs.

A natural concern is not only the definition of weather regime, but also the study of their transition [5]. Statistical tools such as random forest can be used to perform such a task [7]. The performance of physics-based weather forecasts can also be assessed through their ability to predict weather regime transitions [6]. Our study of weather regime transition is noticeably motivated by the relevance and difficulty of their forecast.

We aim to focus on the time-evolution of two dynamical indicators (local dimension and persistence) around transitions between winter-time, North-Atlantic weather regimes. These indicators are relevant to the study of Atlantic-European weather regimes, as each weather regime can be associated with specific values of these indicators [8]. From this static study of weather regimes, we carry on with a dynamic study of transitions.

Note, [9] already investigated the temporal behaviour of local dimension and persistence at the mature stage of seven regimes, used to define round-year sub-seasonal variability of weather over the North-Atlantic and western Europe. These mature stages were identified as local minima of the weather regime index defined by [10] as the projection of the instantaneous atmospheric state on the atmospheric state associated with each regime. Hochman et al. [9] showed that the so-defined mature stages of weather regimes coincided with locally low values of the dimension and inverse persistence, and that these mature stages were both preceded and followed by higher relative values of these indicators. The present paper is concerned with weather regime transitions, which are located between weather regime mature stages. We therefore expect to confirm the relatively higher values of dimension and persistence observed by [9] before and after regime mature stages. However, our study could reveal varying behaviours as we focus on transitions from one specific regime to another, while the study of [9] does not specify which regime precedes or follows a given mature stage.

Our analysis also bears similarity with the one of [11], in which the temporal behaviour of local dimension and persistence during Eastern Mediterranean cold spells was examined. The main difference with the present study is the nature of the event of interest: we are interested in transitions between weather regimes, while cold spells could be viewed as a special type of weather regime (a particular case of Cyprus Lows which is the dominant regime responsible for precipitation in the Eastern Mediterranean region).

The next section is the core of our paper and reviews the results of our study, describing salient features of the time-evolution of dimension and persistence around transitions between four winter-time North-Atlantic weather regimes. The following section draws perspectives and proposes potential applications to realworld meteorological issues. Appendix sections provide details to the tools and data used in the present study.

#### **2 European-Atlantic Weather Regime Transitions**

An EOF-decomposition is performed (see section "Empirical Orthogonal Functions") of winter-time, reanalysed sea-level pressure fields described in Appendix 1. A weather-regime analysis follows using a Gaussian Mixture Model with four modes, corresponding to four weather regimes, in a reduced-space spanned by the three first EOFs (see section "Gaussian Mixture Model" for a discussion). The resulting regimes are shown in Fig. 1 in EOF space and there centroids are shown in Fig. 2 as SLP-anomaly maps.

Figure 1 illustrates that the four regimes are mostly defined through EOF1 and EOF2, as the centroids' EOF3-coordinates are close to zero. Two regimes are associated with positive-negative phases of the first EOF, corresponding to a strong north-south pressure gradient (see Fig. 2), and we label these regimes NAO+ and NAO− to match previous works in the litterature. The two other regimes are

**Fig. 1** Weather regimes as cluster distributions from the fit of a Gaussian Mixture Model to wintertime sea-level-pressure anomaly (SLPa) from reanalysis data. The fit is performed in reduced space through projection of SLPa maps on three leading empirical orthogonal functions (EOF). Colored contours show the 0*.*75*σ* (thick lines) and 1*.*25*σ* (thin lines) ellipses of each distribution around their centroids, with *σ* denoting standard deviation. Grey contours show the whole GMM distribution through marginal distributions in two-dimenisonal EOF-subspaces. Regime names are assigned from comparison with other scientific studies found in the litterature (see Fig. 2)

**Fig. 2** Weather regimes as sea-level-pressure anomalies in (longitude, latitude) coordinates (coastlines are shown), defined by the distributions' centroids from a Gaussian Mixture Model (see Fig. 1 and section "Gaussian Mixture Model"). Regime names are assigned from comparison with other scientific studies found in the literature

associated with a pressure system covering western Europe and extending far-off Europe's west-coast. The regime corresponding to an anticyclonic situation over western Europe is termed BLO+, and its opposite phase is termed BLO−, in accordance with previous studies on such regimes. Note that the small contribution of EOF3 to the definition of BLO+ and BLO− induces a slight west-ward shift of the BLO− pressure system compared to the one of BLO+.

Then, we follow [5] and assign each SLP-anomaly field to a weather regime if it lies inside the 1.25*σ* ellipses, shown in Fig. 1 (in cases of points belonging to two regimes, we assign the regime with highest probability), otherwise no regime is assigned. Next, for any regimes "A" and "B", a transition from regime "A" to regime "B" is defined as either the consecutive passing from "A" to "B" or the consecutive passing from "A" to "no regime" and then to "B" (note that this allows transitions from a regime to itself). As we are interested in the behaviour of dynamical indicators around transitions, we discard transitions of the type "A"→"no regime"→"B" if the "no regime" phase exceeds 24 h.

#### **3 Dimensionality Around Transitions**

The local dimension of sea-level pressure fields is used as an indicator of the state of the atmosphere. Details on this indicator and how is it computed can be found in section "Local Dimensions".

In Fig. 3, one observes statistics of dimension-versus-time profiles centered on transitions. The number of transitions on which the statistics were computed is also mentioned, showing preferred transitions in agreement with [5]. Several behaviours can be observed.

**Fig. 3** Typical profiles of local dimension versus time, centered at transition point, for each possible transitions. Light (resp. dark) greys fill between the 0.05 and 0.95 (resp. 0.25 and 0.75) quantiles, while the dark lines show the average dimension profile around transition from regime "A" to regime "B". In red, statistics over each regime (with no restriction to transitions) are shown. Red dotted (resp. dashed) lines show the 0.05 and 0.95 (resp. 0.25 and 0.75) quantiles, while the full red lines show the average dimension of regime "A" and "B"


Auto-transitions are harder to interpret than normal transitions. They correspond to trajectories in phase-space where the system goes from a well-defined regime to a mixed, undefined regime, and then comes back to the initial well-defined regime. It is likely that these auto-transitions actually mix different types of transient behaviours, with different properties. Auto-transition NAO+ →NAO+ seems to show an overshoot of dimension near the transition point, but the number of transitions (57) is small and therefore only low confidence is attributed to these statistics. Other auto-transition statistics are rather smooth and close to the corresponding regime-statistics, which might be due to the fact that auto-transitions mix different types of transient behaviours.

Figure 5b shows dimension statistics for all transitions, excluding autotransitions. It shows a slight dimension overshoot at the transition point ±1 day. The fact that this overshoot is so small is an indicator of the variety of behaviours near transition, depending on which regimes are involved.

#### **4 Persistence Around Transitions**

We now use the inverse persistence *θ* (also called extremal index) of sea-level pressure fields as an indicator of the state of the atmosphere. Details on this indicator and how is it computed can be found in section "Inverse Persistence *θ*".

In Fig. 4, we show the result of the same procedure followed in the previous section, but replacing the local dimension by the inverse persistence. As these two variables are correlated, the behaviour of inverse persistence resembles the one of dimension around much of the observed transitions. However, the difference between transition-statistics and regime-statistics appear to be more significant for *θ* than for the dimension, with some special behaviours described below.

**Fig. 4** Same as Fig. 3, but for the inverse persistence *θ* (also called extremal index). High values indicate a rapidly changing dynamical system

**Transitions to/from BLO+** The BLO+ regime-statistics of *θ* are much higher than the ones of other regimes, with most values concentrated between 0.17 and 0.19, and almost all values above 0.16. We therefore see high variations of *θ* around transitions from or to BLO+. However, when one is in the regime BLO+, either after or before a transition, we do not observe an overshoot as with the dimension. Rather, we see that the transition-statistics match the BLO+ statistics very near the transition point, while they are much lower 2–3 days away from the transition. This means that, in the regime BLO+, the inverse persistence is much lower either 2–3 days before or 2–3 days after any transition. Also, the values of *θ* in regimes NAO± and BLO−, up to at least three days around a transition from or to BLO+, are much higher than expected from intra-regime statistics.

We can interpret these fact using the results of [9] who observed a strong decrease of *θ* when weather regimes are well-installed. Therefore, what we see in Figs. 3d, h, l, m–p and 4d, h, l, m–p indicates that the systems rapidly exits/enters regime BLO+, while it needs more time to exit/enter neighbouring regimes when transitioning from or to BLO+.


**Fig. 5** In grey: statistics (0.05, 0.25, 0.75 and 0.95 quantiles, as well as mean) of inverse persistence (**a**) and local dimension (**b**) over all transitions, discarding auto-transitions (from regime "A" to "A"). In red: statistics (0.05, 0.25, 0.75 and 0.95 quantiles, as well as mean) over all values from the dataset (winter-time from 1956 to 2015), without restriction to transitions

Already mentioned earlier, we discard transitions "A→B" if the "no regime" phase between regimes "A" and "B" exceeds 24 h. Raising the maximum length of this "no regime" phase allows to find more transitions, and results in a slight smoothing of the profiles of Figs. 3 and 5, but the observed tendencies remain. Reducing the maximum length of the "no regime" phase between regimes "A" and "B" results in slightly sharper, yet noisier profiles (not shown).

#### **5 Conclusion and Perspectives**

The analysis of reanalysed sea-level pressure maps covering a large part of the North-Atlantic ocean and western Europe, demonstrates that local dynamical indicators of dimension and persistence display great sensitivity to transitions between weather regimes. In particular, we observe higher values of dimension and lower values of persistence near transitions, which is in agreement both with the early definition of weather regimes (as quasi-stationary, low-order recurring states) and with recent studies of weather regimes through these same two dynamical indicators. The study reveals non-homogeneous behaviour of these indicators near transitions, meaning that different transition show different signatures in terms of time-variation of dimension and persistence. Furthermore, we observe that the fingerprint of transitions is more pronounced for persistence than for dimension, and that it spreads over a larger duration (more than ±3 days for persistence but around ±1.5 day for dimension).

This study, combined with recent studies on weather regimes and dynamical indicators, confirm the relevance of these indicators for the understanding of weather regimes, and even reveal the potential for these indicators to be used in the definition of weather regimes. Present findings also indicate that each transition could be identified through the time-behaviour of dimension and persistence. This has great implications and shall motivate further investigations on how to use these indicators for the purpose of detecting regime transitions. However, for each transition we still observe a great variability of time-profiles of dimension and persistence. This suggests to use a variety of related indicators, and not only these two. Recent studies have used these indicators on separated scales, allowing to explore variations in dimensionality and persistence of small-scale variables [23]. Our current analyses also reveal a signature of large-scale weather regime transitions in the time-variation of small-scale dimension and persistence, however with less intensity than for largescale dynamical indicators (not shown). We interpret this as a hint that small-scale organization may be necessary to large-scale transitions. Other local indicators also based on analogues such as the ones used by [24] and [25] shall also be considered in an attempt to predict transitions.

**Acknowledgments** We thank Pierre Ailliot for fruitful discussions on Gaussian Mixture Models. This work was financially supported by the ERC project 856408-STUOD. Support for the Twentieth Century Reanalysis Project version 3 dataset is provided by the U.S. Department of Energy, Office of Science Biological and Environmental Research (BER), by the National Oceanic and Atmospheric Administration Climate Program Office, and by the NOAA Physical Sciences Laboratory. We thank the anonymous reviewer for helpful comments and suggestions.

#### **Appendix 1: Data Description: Twentieth Century Reanalysis**

We use data from the 3rd version of the twentieth Century Reanalysis, which combines surface observations of synoptic pressure and NOAA's Global Forecast System, and prescribes sea surface temperature and sea ice distribution [12].

From this reanalysis we extract the ensemble-mean, sea-Level pressure maps from year 1956 to 2015, at 3h-intervals. We do not use preceding years in order to avoid inconsistency between past, observation-scarce data, and more recent data, better constrained by observations. We could also have selected only data from the satellite era starting in 1979, but this would have diminished the statistical significance of our work.

We focus on a 41×41 grid at 1◦-resolution covering longitudes 30W≤LON≤10E and latitudes 30N≤LAT≤70N, including western Europe and the eastern part of the North-Atlantic Ocean (see Fig. 2). We use only extended-winter data, from October to March, as is typical in North-Atlantic weather-regime studies (see e.g., [9, 6, 8]).

#### **Appendix 2: Statistical Descriptors**

#### *Empirical Orthogonal Functions*

To study winter-time SLP fields, we use the empirical orthogonal function decomposition, also called principal component analysis [13]. It allows to decompose any spatial field (snapshot) of SLP-anomaly (SLPa) onto orthogonal maps (EOFs), ordered by their respective contribution to the total variability in time of SLPa fields. To compute SLPa, we remove a moving seasonal-average using data from ±10 years and ±5 calendar-days, with a Gaussian kernel to give more weight to neighbouring years and calendar days.

In our case, EOFs n◦1–7 contribute respectively to 41%, 24%, 14%, 5.5%, 4.8%, 2.2% and 1.5% of the total signal variance. No that, for our analyses of weather regimes, we use only EOFs n◦1–3, which contribute collectively to 79% of the total variance.

#### *Gaussian Mixture Model*

A Gaussian Mixture Model (GMM) assumes that the random variable it describes is the result of pooling from a finite number of sub-populations (in our case, regimes) whose distributions are Gaussian [14]. Expectation-maximization (EM) allows to find optimal parameters (averages and covariances) of the Gaussian distributions, once the number of regimes has been fixed.

We follow [5], and make a GMM EM-fit using a finite number of EOFs. As we allow the covariances to have any possible shape, the number of parameters to be optimized depends exponentially on the number of EOFs kept, we therefore have not tried using more than 5 EOFs. Then, once the number of EOFs is fixed, a trade-off between the number of parameters (dictated by the number of regimes) and the model adequacy to the data can be found by computing either the Bayesian Information Criterion or the average log-likelihood over an independent set [16]. However, as in the study by [5], we find a very low sensitivity of these indicators to the number of regimes chosen (not shown). We also compute the Silhouette score proposed by [15] to estimate the degree of overlapping between regimes, and find that using more EOFs always leads to more overlapping, and so does using more regimes but to a lesser extent (not shown).

In the end, we make the choice of keeping 3 EOFs and 4 regimes. The choice of 3 EOFs is motivated by the fact that each of the three first EOFs account for more than 10% of the total variance, while EOFs n◦4 and further only represent up to ∼5%. This has the consequence that, even when we retain more than 3 EOFs, the regime centroids found through GMM EM-fits are mostly defined by their projection on the 3 first EOFs, as projections on EOFs 4 and 5 are always closer to 0 then one of the other projections (not shown). The choice of 4 regimes is motivated by the adequacy with other studies [6] and operational weather-forecasting services such as ECMWF who divide into 4 quadrants the reduced-space formed by the projection of geopotential height fields onto their corresponding first-2 EOFs.

#### **Appendix 3: Dynamical Indicators**

#### *Local Dimensions*

We use the same estimator of local dimension as [8], borrowing the python code from the Chaotic Dynamical Systems Kit (https://github.com/yrobink/CDSK). This estimator is based on a definition of local dimension at any point *z* in state-space through the extreme-value distribution of the observable *gz* : *x* → *gz(x)* = − log dist*(z, x)* for any other state-space vector *x* (where "dist" is any distance in the mathematical sense). Large values of this observable are found for points *x* close to *z*: these points are called "analogues" of *z* in the atmospheric- and oceansciences community. Then, the probability that *g(x)* exceeds a given threshold *ρ* is exponential (see, for instance, [17]):

$$P\left(\mathcal{g}\_{\mathbb{Z}}(\mathbf{x}) > \rho\right) \propto \exp(-\rho \, d(\mathbf{z}))\,\,,\tag{A.1}$$

where *d(z)* is the local-dimension that we estimate here. The geometric interpretation of this dimension is that in a space of dimension *d*, the typical number of points inside a sphere of radius *r* scales as *r<sup>d</sup>* . Although such an interpretation of dimension has been connected to the distances to analogues for a long time (see for instance [18] and the famous Grassberger-Proccacia algorithm [19]), only recent works have used extreme-value theory to provide instantaneous, local estimators of dimension [20]. These recent tools are particularly suited for the study of local behaviours, while previous works focused on average, global indicators.

Recently, distances between analogues *x* and their target *z* have been shown to follow distributions whose parameters are given by the length of the available dataset, the analogue rank, and the local dimension as estimated in this paper [21]. This indicator is thus both relevant from a dynamical systems point of view and for practical use of data-based methods.

#### *Inverse Persistence θ*

However, Eq. A.1 is not valid when the system passes close to a fixed point, as this causes trajectories to slow down. In this case, another parameter called the extremal index, or inverse persistence, comes into play:

$$P\left(\mathcal{g}\_{\varepsilon}(\mathbf{x}) > \rho\right) \propto \exp(-\rho \,\theta(z)d(z))\,,\tag{A.2}$$

with 0 *< θ (z)* ≤ 1. Low values of *θ* correspond to highly persistent areas of state-space. It can be interpreted as the inverse mean residence time within a sphere centered on *z* (if divided by the time-increment between two consecutive points in the dataset, which is 3 h in our case). We estimate this parameter with the Süveges likelihood estimator [22]. It is based on counting consecutive points inside a ball centered on *z* (i.e., analogues of the same point *z* that are also consecutive points in the time-ordered dataset).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Frequentist Perspective on Robust Parameter Estimation Using the Ensemble Kalman Filter**

**Sebastian Reich**

**Abstract** Standard maximum likelihood or Bayesian approaches to parameter estimation for stochastic differential equations are not robust to perturbations in the continuous-in-time data. In this paper, we give a rather elementary explanation of this observation in the context of continuous-time parameter estimation using an ensemble Kalman filter. We employ the frequentist perspective to shed new light on two robust estimation techniques; namely subsampling the data and rough path corrections. We illustrate our findings through a simple numerical experiment.

**Keywords** Parameter estimation · Stochastic differential equations · Ensemble Kalman filter · Frequentist approach · Rough path theory

#### **1 Introduction**

In this note, we consider the well-studied problem of parameter estimation for stochastic differential equations (SDEs) from continuous-time observations *X*† *<sup>t</sup>* , *t* ∈ [0*, T* ] [25]. It is well-known that the corresponding maximum likelihood estimator does not depend continuously on the observations *X*† *<sup>t</sup>* , *t* ∈ [0*, T* ], which can result in a systematic estimation bias [27, 14]. In other words, the maximum likelihood estimator is not robust with respect to perturbations in the observations. Here, we revisit this problem from the perspective of online (time-continuous) parameter estimation [6, 11] using the popular ensemble Kalman filter (EnKF) and its continuous-time ensemble Kalman-Bucy filter (EnKBF) formulations [15, 10, 26]. As for the corresponding maximum likelihood approaches, the EnKBF does not depend continuously on the incoming observations *X*† *<sup>t</sup>* , *t* ≥ 0, with respect to the uniform norm topology on the space of continuous functions. This fact has been first investigated in [9] using rough path theory [16]. In particular, as already

S. Reich (-)

Institute of Mathematics, University of Potsdam, Potsdam, Germany e-mail: sebastian.reich@uni-postdam.de

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*,

Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_15

demonstrated for the related maximum likelihood estimator in [14], rough path theory allows one to specify an appropriately generalised topology which leads to a continuous dependence of the EnKBF estimators on the observations. Here we expand the analysis of [9] to a frequentist analysis of the EnKBF in the spirit of [29], where the primary focus is on the expected behaviour of the EnKBF estimators over all admissible observation paths. One recovers that the discontinuous dependence of the EnKBF estimators on the driving observations results in a systematic bias from a frequentist perspective. This is also a well known fact for SDEs driven by multiplicative noise [23].

The proposed frequentist perspective naturally enables the study of known bias correction methods, such as subsampling the data [27], as well as novel de-biasing approaches in the context of the EnKBF.

In order to facilitate a rather elementary mathematical analysis, we consider only the very much simplified problem of parameter estimation for linear SDEs. This restriction allows us to avoid certain technicalities from rough path theory and enables a rather straightforward application of the numerical rough path approach put forward in [13]. As a result we are able to demonstrate that the popular approach of subsampling the data [2, 27, 5] can be well justified from a frequentist perspective. The frequentist perspective also suggests a rather natural approach to the estimation of the required correction term in the case an EnKBF is implemented without subsampling.

We end this introductory paragraph with a reference to [1], which includes a broad survey on alternative estimation techniques. We also point to [9] for an indepth discussion of rough path theory in connection to filtering and parameter estimation.

The remainder of this paper is structured as follows. The problem setting and the EnKBF are introduced in the subsequent Sect. 2. The frequentist perspective and its implications on the specific implementations of an EnKBF in the context of low and high frequency data assimilation are laid out in Sect. 3. The importance of these considerations becomes transparent when applying the EnKBF to perturbed data in Sect. 4. Here again, we restrict attention to a rather simple model setting taken from [17] and also used in [9]. As a result we build a clear connection between subsampling and the necessity for a correction term in the case high frequency data is assimilated directly. A brief numerical demonstration is provided in Sect. 5, which is followed by a concluding remark in Sect. 6.

#### **2 Ensemble Kalman Parameter Estimation**

We consider the SDE parameter estimation problem

$$\mathbf{d}X\_{l} = f(X\_{l}, \theta)\mathbf{d}t + \boldsymbol{\chi}^{1/2}\mathbf{d}W\_{l} \tag{1}$$

subject to observations *X*† *<sup>t</sup>* , *t* ∈ [0*, T* ], which arise from the reference system Frequentist Perspective on Estimation Using the EnKF 239

$$\mathbf{d}X\_t^\dagger = f^\dagger(X\_t^\dagger)\mathbf{d}t + \boldsymbol{\chi}^{1/2}\mathbf{d}W\_t^\dagger,\tag{2}$$

where the unknown drift function *<sup>f</sup>* †*(x)* typically satisfies *<sup>f</sup>* †*(x)* <sup>=</sup> *f (x, θ* †*)* and *<sup>θ</sup>* † denotes the true parameter value. Here we assume for simplicity that the unknown parameter is scalar-valued and that the state variable is *d*-dimensional with *d* ≥ 1. Furthermore, *Wt* and *W*† *<sup>t</sup>* denote independent standard *d*-dimensional Brownian motions and *γ >* 0 is the (known) diffusion constant.

Following the Bayesian paradigm, we treat the unknown parameter as a random variable *Θ*. Furthermore, we apply a sequential approach and update *Θ* with the incoming data *X*† *<sup>t</sup>* as a function of time. Hence we introduce the random variable *Θt* which obeys the Bayesian posterior distribution given all observations *X*† *<sup>τ</sup>* , *τ* ∈ [0*, t*], up to time *t >* 0. Furthermore, instead of exactly solving the time-continuous Bayesian inference problem as specified by the associated Kushner–Stratonovitch equation [6, 26], we define the time evolution of *Θt* by an application of the (deterministic) ensemble Kalman–Bucy filter (EnKBF) meanfield equations [10, 26], which take the form

$$\mathbf{d}\Theta\_{l} = \boldsymbol{\chi}^{-1}\boldsymbol{\pi}\_{l}\left[ (\boldsymbol{\theta} - \boldsymbol{\pi}\_{l}[\boldsymbol{\theta}]) \otimes f(\boldsymbol{X}\_{l}^{\dagger}, \boldsymbol{\theta}) \right] \mathbf{d}I\_{l},\tag{3a}$$

$$\mathbf{d}I\_{l} = \mathbf{d}X\_{l}^{\dagger} - \frac{1}{2} \left( f(X\_{l}^{\dagger}, \Theta\_{l}) + \pi\_{l}[f(X\_{l}^{\dagger}, \theta)] \right) \mathbf{d}t,\tag{3b}$$

where *πt* denotes the probability density function (PDF) of *Θt* and *πt*[*g*] the associated expectation value of a function *g(θ )*. The column vector *It* , defined by (3b), is called the innovation, while the row vector

$$K\_{l}(\pi\_{l}) = \boldsymbol{\chi}^{-1}\pi\_{l}\left[ (\boldsymbol{\theta} - \pi\_{l}[\boldsymbol{\theta}]) \otimes f(\boldsymbol{X}\_{l}^{\dagger}, \boldsymbol{\theta}) \right],\tag{4}$$

premultiplying the innovation in (3a) is called the gain. Here the notation *a* ⊗ *b* = *ab*T, where *a, b* can be any two column vectors, has been used. The initial condition *Θ*<sup>0</sup> ∼ *π*<sup>0</sup> is provided by the prior PDF of the unknown parameter.

A Monte-Carlo implementation of the mean-field equations (3) leads to the interacting particle system

$$\mathrm{d}\Theta\_{l}^{(l)} = \chi^{-1}\pi\_{l}^{M}\left[ (\theta - \pi\_{l}^{M}[\theta]) \otimes f(X\_{l}^{\dagger}, \theta) \right] \mathrm{d}I\_{l}^{(l)},\tag{5a}$$

$$\mathrm{d}I\_{l}^{(l)} = \mathrm{d}X\_{l}^{\dagger} - \frac{1}{2} \left( f(X\_{l}^{\dagger}, \Theta\_{l}^{(l)}) + \pi\_{l}^{M} [f(X\_{l}^{\dagger}, \theta)] \right) \mathrm{d}t,\tag{5b}$$

*i* = 1*,...,M*, where expectations are now taken with respect to the empirical measure. That is,

$$\pi\_{\mathbf{r}}^{M}[\mathbf{g}] = \frac{1}{M} \sum\_{l=1}^{M} \mathbf{g}(\Theta\_{\mathbf{r}}^{(l)}) \tag{6}$$

for given function *g(θ )*, and all Monte-Carlo samples are driven by the same (fixed) observations *X*† *<sup>t</sup>* . The initial samples *<sup>Θ</sup>(i)* <sup>0</sup> , *i* = 1*,...,M*, are drawn identically and independently from the prior distribution *π*0.

We note in passing that there is also a stochastic variant of the innovation process [26] defined by

$$\mathbf{d}I\_{l} = \mathbf{d}X\_{l}^{\dagger} - f(X\_{l}^{\dagger}, \Theta\_{l})\mathbf{d}t - \boldsymbol{\chi}^{1/2}\mathbf{d}W\_{l},\tag{7}$$

which leads to the Monte-Carlo approximation

$$\mathbf{d}I\_{l}^{(l)} = \mathbf{d}X\_{l}^{\dagger} - f(X\_{l}^{\dagger}, \Theta\_{l}^{(l)})\mathbf{d}t - \boldsymbol{\nu}^{1/2}\mathbf{d}W\_{l}^{(l)} \tag{8}$$

of the innovation in (5).

*Remark 1* There is an intriguing connection to the stochastic gradient descent approach to the estimation of *θ* †, as proposed in [30], which is written as

$$\mathbf{d}\theta\_{l} = \frac{\alpha\_{l}}{\chi} \nabla\_{\theta} f\left(X\_{l}^{\dagger}, \theta\_{l}\right) \mathbf{d}\tilde{I}\_{l},\tag{9a}$$

$$\mathbf{d}\tilde{I}\_l = \mathbf{d}X\_l^\dagger - f(X\_l^\dagger, \theta\_l)\mathbf{d}t \tag{9b}$$

in our notation, where *αt >* 0 denotes the learning rate. We note that (9) shares with (3) the gain times innovation structure. However, while (3) approximates the Bayesian inference problem, formulation (9) treats the parameter estimation problem from an optimisation perspective. Both formulations share, however, the discontinuous dependence on the observation path *X*† *<sup>t</sup>* , and the proposed frequentist analysis of the EnKBF (3) also applies in simplified form to (9). We also point out that (3) is affine invariant [18] and does not require the computation of partial derivatives.

We now state a numerical implementation with step-size *Δt >* 0 and denote the resulting numerical approximations at *tn* = *nΔt* by *Θn* ∼ *πn*, *n* ≥ 1. While a standard Euler–Maruyama approximation could be applied, the following stable discrete-time mean-field formulation of the EnKBF

$$\Theta\_{n+1} = \Theta\_n + K\_n \left\{ (X\_{l\_{n+1}}^\dagger - X\_{l\_n}^\dagger) - \frac{1}{2} \left( f(X\_{l\_n}^\dagger, \Theta\_n) + \pi\_n[f(X\_{l\_n}^\dagger, \theta)] \right) \Delta t \right\} \tag{10}$$

is inspired by [3] with Kalman gain

$$K\_n = \pi\_n \left[ (\theta - \pi\_n[\theta]) \otimes f(X\_{t\_n}^\dagger, \theta) \right] \times \tag{11a}$$

$$\left(\boldsymbol{\chi} + \Delta t \pi\_n \left[ \left( f(\boldsymbol{X}\_{l\_n}^\dagger, \boldsymbol{\theta}) - \pi\_n [f(\boldsymbol{X}\_{l\_n}^\dagger, \boldsymbol{\theta})] \right) \otimes f(\boldsymbol{X}\_{l\_n}^\dagger, \boldsymbol{\theta}) \right] \right)^{-1}.\tag{11b}$$

It is straightforward to combine this time discretisation with the Monte-Carlo approximation (5) in order to obtain a complete numerical implementation of the EnKBF.

*Remark 2* The rough path analysis of the EnKBF presented in [9] is based on a Stratonovich reformulation of (3) and its appropriate time discretisation. Here we follow the Itô/Euler–Maruyama formulation of the data-driven term in (3),

$$\int\_{0}^{T} \operatorname{g}(X\_{l}^{\dagger}, t) \operatorname{d}X\_{l}^{\dagger} = \lim\_{\Delta t \to 0} \sum\_{l=1}^{L} \operatorname{g}(X\_{l\_{0}}^{\dagger}, t\_{l}) (X\_{l\_{0}+1}^{\dagger} - X\_{l\_{0}}^{\dagger}) \tag{12}$$

for any continuous function *g(x, t)* and *Δt* = *T /L*, as it corresponds to standard implementation of the EnKBF and is easier to analyse in the context of this paper.

The EnKBF provides only an approximate solution to the Bayesian inference problem for general nonlinear *f (x, θ )*. However, it becomes exact in the mean-field limit for affine drift functions *f (x, θ )* = *θAx* + *Bx* + *c*.

*Example 1* Consider the stochastic partial differential equation

$$
\partial\_t \mu = -U \partial\_\mathbf{y} \mu + \rho \partial\_\mathbf{y}^2 \mu + \dot{\mathcal{W}} \tag{13}
$$

over a periodic spatial domain *y* ∈ [0*, L)*, where W*(t, y)* denotes space-time white noise, *<sup>U</sup>* <sup>∈</sup> <sup>R</sup>, and *ρ >* 0 are given parameters. A standard finite-difference discretisation in space with *d* grid points and mesh-size *Δy* leads to a linear system of SDEs of the form

$$\mathbf{d}\mathbf{u}\_{l} = -(UD + \rho DD^{\mathrm{T}})\mathbf{u}\_{l}\mathbf{d}t + \Delta \mathbf{y}^{-1/2}\mathrm{d}W\_{l},\tag{14}$$

where **<sup>u</sup>***<sup>t</sup>* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* denotes the vector of grid approximations at time *<sup>t</sup>*, *<sup>D</sup>* <sup>∈</sup> <sup>R</sup>*d*×*<sup>d</sup>* a finite difference approximation of the spatial derivative *∂y* , and *Wt* the standard *<sup>d</sup>*-dimensional Brownian motion. We can now set *Xt* <sup>=</sup> **<sup>u</sup>***<sup>t</sup>* , *<sup>γ</sup>* <sup>=</sup> *Δy*−<sup>1</sup> and identify either *θ* = *U* or *θ* = *ρ* as the unknown parameter in order to obtain an SDE of the form (1).

In this note, we further simplify our given inference problem to the case

$$f(\mathbf{x}, \theta) = \theta A \mathbf{x} \,, \tag{15}$$

where *<sup>A</sup>* <sup>∈</sup> <sup>R</sup>*d*×*<sup>d</sup>* is a normal matrix with eigenvalues in the left half plane. That is *σ (A)* <sup>⊂</sup> <sup>C</sup>−. The reference parameter value is set to *<sup>θ</sup>* † <sup>=</sup> 1. Hence the SDE (2) possesses a Gaussian invariant measure with mean zero and covariance matrix

$$C = -\gamma (A + A^{\mathrm{T}})^{-1}.\tag{16}$$

We assume from now on that the observations *X*† *<sup>t</sup>* are realisations of (2) with initial condition *X*† <sup>0</sup> ∼ N*(*0*,C)*.

Under these assumptions, the EnKBF (3) simplifies drastically, and we obtain

$$\mathbf{d}\Theta\_{l} = \frac{\sigma\_{l}}{\chi} (AX\_{l}^{\dagger})^{\mathrm{T}} \mathrm{d}I\_{l},\tag{17a}$$

$$\mathrm{d}I\_{l} = \mathrm{d}X\_{l}^{\dagger} - \frac{1}{2} \left( \Theta\_{l} + \pi\_{l}[\theta] \right) \mathrm{d}X\_{l}^{\dagger} \mathrm{d}t,\tag{17b}$$

with variance

$$
\sigma\_l = \pi\_l \left[ \left( \theta - \pi\_l[\theta] \right)^2 \right]. \tag{18}
$$

*Remark 3* For completeness, we state the corresponding formulation for the stochastic gradient descent approach (9):

$$\mathrm{d}\theta\_{l} = \frac{\alpha\_{l}}{\mathcal{Y}} (AX\_{l}^{\dagger})^{\mathrm{T}} \mathrm{d}\tilde{I}\_{l},\tag{19a}$$

$$\mathbf{d}\tilde{I}\_l = \mathbf{d}X\_l^\dagger - \theta\_l A X\_l^\dagger \mathbf{d}t. \tag{19b}$$

We find that the learning rate *αt* takes the role of the variance *σt* in (17). However, we emphasise again that the same pathwise stochastic integrals arise from both formulations, and therefore, the same robustness issue of the resulting estimators *θt* , *t >* 0, arises.

Similarly, the discrete-time mean-field EnKBF (10) reduces to

$$\Theta\_{n+1} = \Theta\_n + K\_n \left\{ (X\_{l\_{n+1}}^\dagger - X\_{l\_n}^\dagger) - \frac{1}{2} \left( \Theta\_n + \pi\_n[\theta] \right) AX\_{l\_n}^\dagger \Delta t \right\} \tag{20}$$

with Kalman gain

$$K\_n = \sigma\_n (AX\_{l\_n}^\dagger)^\mathrm{T} \left( \chi + \Delta t \sigma\_n (AX\_{l\_n}^\dagger)^\mathrm{T} AX\_{l\_n}^\dagger \right)^{-1} . \tag{21}$$

Furthermore, since *X*† *<sup>t</sup>* ∼ N*(*0*,C)*,

$$((AX\_l^\dagger)^T A X\_l^\dagger = (A^T A) : (X\_l^\dagger \otimes X\_l^\dagger) \approx (A^T A) : \mathcal{C} \tag{22}$$

for *d* ! 1, and we may simplify the Kalman gain to

$$K\_n = \sigma\_n \left( A X\_{t\_\hbar}^\dagger \right)^\mathrm{T} \left( \mathcal{Y} + \Delta t \sigma\_n \left( A^\mathrm{T} A \right) : \mathcal{C} \right)^{-1} \,. \tag{23}$$

Here we have used the notation *<sup>A</sup>* : *<sup>B</sup>* <sup>=</sup> tr*(A*T*B)* to denote the Frobenius inner product of two matrices *A, B* <sup>∈</sup> <sup>R</sup>*d*×*<sup>d</sup>* . The approximation (22) becomes exact in the limit *d* → ∞, which we will frequently assume in the following section. Please note that

$$K\_n = \frac{\sigma\_n}{\chi} \left( A X\_{l\_h}^\dagger \right)^\mathrm{T} + \mathcal{O}(\Delta t) \tag{24}$$

under the stated assumptions.

*Remark 4* The Stratonovitch reformulation of (17) replaces (17a) by

$$\mathbf{d}\,\Theta\_{l} = \frac{\sigma\_{l}}{\mathcal{V}} \left\{ (AX\_{l}^{\dagger})^{\mathrm{T}} \diamond \mathbf{d}I\_{l} - \frac{\mathcal{V}}{2} \mathrm{tr}\,(A)\,\mathrm{d}t \right\}.\tag{25}$$

The innovation *It* remains as before. See Appendix B of [9] for more details. An appropriate time discretisation of the innovation-driven term replaces the Kalman gain (21) by

$$K\_{n+1/2} = \sigma\_n (AX\_{t\_{n+1/2}}^{\dagger})^{\mathrm{T}} \left(\boldsymbol{\chi} + \Delta t \sigma\_n (AX\_{t\_{n+1/2}}^{\dagger})^{\mathrm{T}} AX\_{t\_{n+1/2}}^{\dagger}\right)^{-1},\tag{26}$$

where

$$X\_{I\_{n+1/2}}^{\\\dagger} = \frac{1}{2} (X\_{I\_n}^{\\\dagger} + X\_{I\_{n+1}}^{\\\dagger}) \,. \tag{27}$$

Please note that a midpoint discretisation of the data-driven term in (25) results in

$$(AX\_{l\_{n+1/2}}^{\\\dagger})^T (X\_{l\_{n+1}}^{\\\dagger} - X\_{l\_n}^{\\\dagger}) = (AX\_{l\_n}^{\\\dagger})^T (X\_{l\_{n+1}}^{\\\dagger} - X\_{l\_n}^{\\\dagger}) + \\\tag{28a}$$

$$\frac{1}{2}A^{\mathrm{T}}:(X\_{t\_{n+1}}^{\dagger} - X\_{t\_n}^{\dagger}) \otimes (X\_{t\_{n+1}}^{\dagger} - X\_{t\_n}^{\dagger})\tag{28b}$$

and that

$$\frac{1}{2}A^T : (X\_{l\_{n+1}}^\dagger - X\_{l\_n}^\dagger) \otimes (X\_{l\_{n+1}}^\dagger - X\_{l\_n}^\dagger) \approx \frac{\Delta t}{2} \text{tr}\,(A),\tag{29}$$

which justifies the additional drift term in (25). A precise meaning of the approximation in (29) will be given in Remark 5 below.

Alternatively, if one wishes to explicitly utilise the availability of continuous-time data *X*† *<sup>t</sup>* , one could apply the following variant of (20):

$$\Theta\_{n+1} = \Theta\_n + \frac{\sigma\_n}{\mathcal{Y}} \int\_{t\_n}^{t\_{n+1}} (AX\_l^\dagger)^T \mathbf{d}X\_l^\dagger - \frac{1}{2} K\_n A X\_{t\_n}^\dagger \left(\Theta\_n + \pi\_n[\theta] \right) \Delta t,\tag{30}$$

and following the Itô/Euler–Maruyama approximation (12), discretise the integral with a small inner step-size *Δτ* = *Δt/L*, *L* ! 1; that is,

$$\int\_{t\_0}^{t\_{n+1}} (AX\_l^\dagger)^\mathrm{T} \mathbf{d}X\_l^\dagger \approx \sum\_{l=0}^{L-1} (AX\_{t\_l}^\dagger)^\mathrm{T} (X\_{t\_{l+1}}^\dagger - X\_{t\_l}^\dagger) \tag{31}$$

with *τl* = *tn* + *lΔτ* . We note that

$$\sum\_{l=0}^{L-1} (AX\_{\mathfrak{r}\_l}^{\dagger})^T (X\_{\mathfrak{r}\_{l+1}}^{\dagger} - X\_{\mathfrak{r}\_l}^{\dagger}) = (AX\_{\mathfrak{l}\_l}^{\dagger})^T (X\_{\mathfrak{l}\_{l+1}}^{\dagger} - X\_{\mathfrak{l}\_l}^{\dagger}) + \tag{32a}$$

$$A^\top : \left(\sum\_{l=0}^{L-1} (X\_{\eta\_l}^\dagger - X\_{I\_n}^\dagger) \otimes (X\_{\eta\_{l+1}}^\dagger - X\_{\eta\_l}^\dagger)\right),\tag{32b}$$

which is at the heart of rough path analysis [13] and which we utilise in the following section.

#### **3 Frequentist Analysis**

It is well-known that the second-order contribution in (32) leads to a discontinuous dependence of the integral on the observed *X*† *<sup>t</sup>* in the uniform norm topology on the space of continuous functions. Rough path theory fixes this problem by defining appropriately extended topologies and has been extended to the EnKBF in [9]. In this section, we complement the path-wise analysis from [9] by an analysis of the impact of second-order contribution on the EnKBF (17) from a frequentist perspective, which analyses the behaviour of EnKBF over all possible observations *X*† *<sup>t</sup>* subject to (2). In other words, one switches from a strong solution concept to a weak one. While we assume that the observations satisfy (2), throughout this section, we will analyse the impact of a perturbed observation process on the EnKBF in Sect. 4.

We first derive evolution equations for the conditional mean and variance under the assumption that *Θ*<sup>0</sup> is Gaussian distributed with given prior mean *m*prior and variance *σ*prior. It follows directly from (17) that the conditional mean *μt* = *πt*[*θ*], that is the mean of *Θt* , satisfies the SDE

$$\mathbf{d}\mu\_{l} = \frac{\sigma\_{l}}{\mathcal{Y}} \left( (AX\_{l}^{\dagger})^{\mathrm{T}} \mathrm{d}X\_{l}^{\dagger} - \mu\_{l} \left( A^{\mathrm{T}} A \right) : (X\_{l}^{\dagger} \otimes X\_{l}^{\dagger}) \, \mathrm{d}t \right), \tag{33}$$

Frequentist Perspective on Estimation Using the EnKF 245

which simplifies to

$$\mathbf{d}\mu\_{l} = \frac{\sigma\_{l}}{\mathcal{Y}} \left( (AX\_{l}^{\dagger})^{\mathrm{T}} \mathrm{d}X\_{l}^{\dagger} - \mu\_{l} \left( A^{\mathrm{T}} A \right) : \mathrm{C} \, \mathrm{d}t \right), \tag{34}$$

under the approximation (22). The initial condition is *μ*<sup>0</sup> = *m*prior. The evolution equation for the conditional variance, that is the variance of *Θt* , is given by

$$\frac{d}{dt}\sigma\_l = -\frac{\sigma\_l^2}{\nu} \left( A^\mathrm{T} A \right) : \left( X\_l^\dagger \otimes X\_l^\dagger \right) \tag{35}$$

with initial condition *σ*<sup>0</sup> = *σ*prior and which again reduces to

$$\frac{\mathrm{d}}{\mathrm{d}t}\sigma\_{l} = -\frac{\sigma\_{l}^{2}}{\chi} \left(A^{\mathrm{T}}A\right) : \mathrm{C} \tag{36}$$

under the approximation (22).

We now perform a frequentist analysis of the estimator *μt* defined by (34) and (36), that is, we perform a weak analysis of the SDE (34) in terms of the first two moments of *μt* [29]. In the first step, we take the expectation of (34) over all realisations *X*† *<sup>t</sup>* of the SDE (2), which we denote by

$$m\_1 := \mathbb{E}^\dagger[\mu\_1]. \tag{37}$$

The associated evolution equation is given by

$$\frac{\mathbf{d}}{\mathbf{d}t}\boldsymbol{m}\_{I} = \frac{\sigma\_{I}}{\mathcal{Y}} \left(\boldsymbol{A}^{\mathrm{T}}\boldsymbol{A}\right) : \mathbb{E}^{\dagger} \left[\boldsymbol{X}\_{I}^{\dagger} \otimes \boldsymbol{X}\_{I}^{\dagger}\right] - \frac{\sigma\_{I}}{\mathcal{Y}} \left(\boldsymbol{A}^{\mathrm{T}}\boldsymbol{A}\right) : \mathbb{C}\,\boldsymbol{m}\_{I},\tag{38}$$

which reduces to

$$\frac{\mathbf{d}}{\mathbf{d}t}m\_{l} = \frac{\sigma\_{l}}{\chi} \left(A^{\mathrm{T}}A\right) : \mathbb{C}\left(1 - m\_{l}\right) = \sigma\_{l} \left(A^{\mathrm{T}}A\right) : \left(A + A^{\mathrm{T}}\right)^{-1} \left(1 - m\_{l}\right). \tag{39}$$

In the second step, we also look at the frequentist variance

$$p\_l := \mathbb{E}^\dagger[(\mu\_l - m\_l)^2].\tag{40}$$

Using

$$\mathbf{d}(\mu\_l - m\_l) = \frac{\sigma\_l}{\mathcal{V}} \left\{ (A^\mathrm{T} A) : \left( X\_l^\dagger \otimes X\_l^\dagger - C \right) \mathrm{d}t + \mathcal{V}^{1/2} (A X\_l^\dagger)^\mathrm{T} \mathrm{d}W\_l^\dagger \right\} \, - \tag{41a}$$

$$\frac{\sigma\_l}{\mathcal{Y}}(\boldsymbol{A}^\mathrm{T}\boldsymbol{A}):\boldsymbol{C}\,(\boldsymbol{\mu}\_l-\boldsymbol{m}\_l)\mathrm{d}t,\tag{41b}$$

we obtain

$$\frac{\mathbf{d}}{\mathbf{d}t}p\_l = -\frac{\sigma\_l}{\mathcal{Y}}\left(\mathbf{A}^\mathrm{T}A\right) : \mathbf{C}\left(2p\_l - \sigma\_l\right) \tag{42a}$$

$$2\sigma\_{l\_{\{A\}}^\mathrm{T}A\_\lambda} : \mathbb{Z}^\dagger \left[\begin{array}{ccc} & & & \mathbf{c} \\ & & & \mathbf{c} \end{array}\right] (\mu & \mu\_{\mathbf{m}}) \tag{42b}$$

$$\frac{2\sigma\_{l}}{\mathcal{V}} \left( A^{\text{T}} A \right) : \mathbb{E}^{\dagger} \left[ \left( X\_{l}^{\dagger} \otimes X\_{l}^{\dagger} - \mathcal{C} \right) \left( \mu\_{l} - m\_{l} \right) \right], \tag{42b}$$

which we simplify to

$$\frac{\mathbf{d}}{\mathbf{d}t}p\_l = \frac{\sigma\_l}{\chi} \left(\mathbf{A}^\mathrm{T}\mathbf{A}\right) : \mathbf{C}\left(\sigma\_l - 2p\_l\right) = \sigma\_l \left(\mathbf{A}^\mathrm{T}\mathbf{A}\right) : \left(\mathbf{A} + \mathbf{A}^\mathrm{T}\right)^{-1} \left(\sigma\_l - 2p\_l\right) \tag{43}$$

under the approximation (22). The initial conditions are *m*<sup>0</sup> = *m*prior and *p*<sup>0</sup> = 0, respectively. We note that the differential equations (36) and (43) are explicitly solvable. For example, it holds that

$$\sigma\_l = \frac{\sigma\_0}{1 + (A^\mathrm{T}A) : (A^\mathrm{T} + A)^{-1}\sigma\_0 t} \tag{44}$$

and one finds that *σt* <sup>∼</sup> <sup>1</sup>*/((A*T*A)* : *(A*<sup>T</sup> <sup>+</sup> *A)*−<sup>1</sup> *t)* for *<sup>t</sup>* ! 1. It can also be shown that *pt* ≤ *σt* for all *t* ≥ 0. Furthermore, this analysis suggests that the learning rate in the stochastic gradient descent formulation (19) should be chosen as

$$\alpha\_l = \min\left\{\bar{\alpha}, \frac{1}{(A^\mathrm{T}A) : (A^\mathrm{T} + A)^{-1}t}\right\},\tag{45}$$

where *α >*¯ 0 denotes an initial learning rate; for example *α*¯ = *σ*0.

We finally conduct a formal analysis of the ensemble Kalman filter time-stepping (20) and demonstrate that the method is first-order accurate with regard to the implied frequentist mean *mt* . We recall (24) and conclude from (20) that the implied update on the variance *σn* satisfies

$$
\sigma\_{n+1} = \sigma\_n - \frac{\sigma\_n^2}{\chi} \left( A^\mathrm{T} A \right) : \mathbb{C} \Delta t + \mathcal{O}(\Delta t^2), \tag{46}
$$

which provides a first-order approximation to (36).

We next analyse the evolution equation (34) for the conditional mean *μt* and its numerical approximation

$$
\mu\_{n+1} = \mu\_n + K\_n \left\{ (X\_{l\_{n+1}}^\dagger - X\_{l\_n}^\dagger) - \mu\_n A X\_{l\_n}^\dagger \Delta t \right\} \tag{47}
$$

arising from (20). Here we follow [13] in order to analyse the impact of the data *X*† *t* on the estimator. An in-depth theoretical treatment can be found in [9].

Comparing (47) to (34) and utilising (24), we find that the key quantity of interest is

$$J\_{t\_n, t\_{n+1}}^{\\\dagger} := \int\_{t\_n}^{t\_{n+1}} (AX\_t^\dagger)^T \mathbf{d}X\_t^\dagger,\tag{48}$$

which we can rewrite as

$$J\_{t\_{\hbar},t\_{\hbar+1}}^{\dagger} = A^{\mathrm{T}} : (X\_{t\_{\hbar}}^{\dagger} \otimes X\_{t\_{\hbar},t\_{\hbar+1}}^{\dagger}) + A^{\mathrm{T}} : \mathbb{X}\_{t\_{\hbar},t\_{\hbar+1}}^{\dagger} \,. \tag{49}$$

Here, motivated by (32) and following standard rough path notation, we have used

$$X\_{I\_{\ln, \ell n+1}}^{\dagger} := X\_{I\_{\ln+1}}^{\dagger} - X\_{I\_{\ln}}^{\dagger} \tag{50}$$

and the second-order iterated Itô integral

$$\mathbb{X}\_{t\_n, t\_{n+1}}^{\dagger} := \int\_{t\_n}^{t\_{n+1}} (X\_t^{\dagger} - X\_{t\_n}^{\dagger}) \otimes \mathrm{d}X\_t^{\dagger}. \tag{51}$$

The difference between the integral (48) and its corresponding approximation in (47) is provided by *<sup>A</sup>*<sup>T</sup> : <sup>X</sup>† *tn,tn*+<sup>1</sup> plus higher-order terms arising from (24). The iterated integral X† *tn,tn*+<sup>1</sup> becomes a random variable from the frequentist perspective. Taking note of (2), we find that the drift, *f (x)* = *Ax*, contributes with terms of order <sup>O</sup>*(Δt*2*)* to <sup>X</sup>† *tn,tn*+<sup>1</sup> and the expected value of <sup>X</sup>† *tn,tn*+<sup>1</sup> therefore satisfies

$$\mathbb{E}^{\dagger}[\mathbb{X}\_{t\_n, t\_{n+1}}^{\dagger}] = \mathcal{O}(\Delta t^2),\tag{52}$$

since <sup>E</sup>†[*W*† *tn,τ* ] = 0 for *τ>tn*, and

$$\mathbb{E}^{\dagger}[\mathbb{W}\_{t\_{h,t\_{h+1}}}^{\dagger}] = \frac{1}{2}\mathbb{E}^{\dagger}[\boldsymbol{W}\_{t\_{h,t\_{h+1}}}^{\dagger} \otimes \boldsymbol{W}\_{t\_{h,t\_{h+1}}}^{\dagger} - [\boldsymbol{W}\_{t\_{h}}^{\dagger}, \boldsymbol{W}\_{t\_{h,t\_{h+1}}}^{\dagger}]] - \frac{\Delta t}{2}I = 0,\qquad(53)$$

where we have introduced the commutator

$$\mathbb{E}\left[W\_{l\_n}^\dagger, W\_{l\_n, l\_{n+1}}^\dagger\right] := W\_{l\_n}^\dagger \otimes W\_{l\_n, l\_{n+1}}^\dagger - W\_{l\_n, l\_{n+1}}^\dagger \otimes W\_{l\_n}^\dagger. \tag{54}$$

Hence we find that, while (47) is not a first-order (strong) approximation of the SDE (34), the approximation becomes first-order in *mt* when averaged over realisations *X*† *<sup>t</sup>* of the SDE (2). More precisely, one obtains

$$\mathbb{E}^{\dagger}[J\_{l\_n, l\_{n+1}}^{\dagger}] = (A^{\mathrm{T}}A) : \mathbb{C}\,\Delta t + \mathcal{O}(\Delta t^2). \tag{55}$$

We note that the modified scheme (30) leads to the same time evolution in the variance *σn* while the update in *μn* is changed to

$$
\mu\_{n+1} = \mu\_n + \frac{\sigma\_n}{\nu} \int\_{t\_n}^{t\_{n+1}} (AX\_l^\dagger)^T \mathbf{d}X\_l^\dagger - K\_n A X\_{l\_n}^\dagger \mu\_n \Delta t. \tag{56}
$$

This modification results in a more accurate evolution in the conditional mean *μn*, but because of (52) it does not impact to leading order the evolution of the underlying frequentist mean, *mn* <sup>=</sup> <sup>E</sup>†[*μn*]. We summarise our findings in the following proposition.

**Proposition 1** *The discrete-time EnKBF implementations (20) and (30) both provide first-order approximations to the time evolution of the frequentist mean, mt , and the frequentist variance, pt . In other words, both methods converge weakly with order one.*

We also note that the frequentist uncertainty is essentially data-independent and depends only on the time window [0*, T* ] over which the data gets observed. Hence, for fixed observation interval [0*, T* ], it makes sense to choose the step-size *Δt* such that the discretisation error (bias) remains on the same order of magnitude as *p* 1*/*2 *<sup>T</sup>* <sup>≈</sup> *<sup>σ</sup>*1*/*<sup>2</sup> *<sup>T</sup>* . Selecting a much smaller step-size would not significantly reduce the frequentist estimation error in the conditional estimator *μT* .

*Remark 5* We can now give a precise reformulation of the approximation (29):

$$\frac{1}{2}\mathbb{E}^{\dagger}\left[A^{\mathrm{T}}:(X\_{l\_{\mathrm{h}},l\_{\mathrm{h}+1}}^{\dagger}\otimes X\_{l\_{\mathrm{h}},l\_{\mathrm{h}+1}}^{\dagger})\right]=\frac{\Delta t\,\mathcal{Y}}{2}\mathrm{tr}\left(A\right)+\mathcal{O}(\Delta t^{2}),\tag{57}$$

which is at the heart of the Stratonovich formulation (25) of the EnKFB [9].

#### **4 Multi-Scale Data**

We now have all the material in place to study the dependency of the EnKBF estimator on a set of observations *X() <sup>t</sup>* ,  *>* 0, which approach the theoretical *<sup>X</sup>*† *t* with respect to the uniform norm topology on the space of continuous functions as → 0. Since the second-order contribution in (32), that is (51), does not depend continuously on such perturbations, we demonstrate in this section that a systematic bias arises in the EnKBF. Furthermore, we show how the bias can be eliminated either via subsampling the data, which effectively amounts to ignoring these second-order contributions, or via an appropriate correction term, which ensures a continuous dependence on observations *X() <sup>t</sup>* with respect to the uniform norm topology. More specifically, we investigate the impact of a possible discrepancy between the SDE model (1), for which we aim to estimate the parameter *θ*, and the data generating SDE (2). We therefore replace (2) by the following two-scale SDE [17]:

Frequentist Perspective on Estimation Using the EnKF 249

$$\mathrm{d}X\_{l}^{(\epsilon)} = A X\_{l}^{(\epsilon)} \,\mathrm{d}t + \frac{\mathcal{Y}^{1/2}}{\epsilon} M P\_{l}^{(\epsilon)} \,\mathrm{d}t,\tag{58a}$$

$$\mathrm{d}P\_{l}^{(\epsilon)} = -\frac{1}{\epsilon} \mathrm{d}P\_{l}^{(\epsilon)} \, \mathrm{d}t + \mathrm{d}W\_{l}^{\dagger},\tag{58b}$$

where

$$M = \begin{pmatrix} 1 & \beta \\ -\beta & 1 \end{pmatrix},\tag{59}$$

*β* = 2 and = 0*.*01. The dimension of state space is *d* = 2 throughout this section. While we restrict here to the simple two-scale model (58), similar scenarios can arise from deterministic fast-slow systems [24, 7].

The associated EnKBF mean-field equations in the parameter *Θt* , which we now denote by *Θ() <sup>t</sup>* in order to explicitly record its dependence on the scale parameter \$ 1, become

$$\mathrm{d}\Theta\_{l}^{(\epsilon)} = \frac{\sigma\_{l}^{(\epsilon)}}{\mathcal{Y}} (AX\_{l}^{(\epsilon)})^{\mathsf{T}} \mathrm{d}I\_{l}^{(\epsilon)},\tag{60a}$$

$$\mathrm{d}I\_{l}^{(\epsilon)} = \mathrm{d}X\_{l}^{(\epsilon)} - \frac{1}{2} \left( \Theta\_{l}^{(\epsilon)} + \pi\_{l}^{(\epsilon)}[\theta] \right) \mathrm{d}X\_{l}^{(\epsilon)} \mathrm{d}t,\tag{60b}$$

with variance

$$
\sigma\_{\mathfrak{r}}^{(\epsilon)} = \pi\_{\mathfrak{r}}^{(\epsilon)} \left[ \left( \theta - \pi\_{\mathfrak{r}}^{(\epsilon)}[\theta] \right)^2 \right] \tag{61}
$$

and *Θ <sup>t</sup>* <sup>∼</sup> *<sup>π</sup>() <sup>t</sup>* . The discrete-time mean-field EnKBF (20) turns into

$$\Theta\_{n+1}^{(\epsilon)} = \Theta\_n^{(\epsilon)} + K\_n^{(\epsilon)} \left\{ \left( X\_{l\_{n+1}}^{(\epsilon)} - X\_{l\_n}^{(\epsilon)} \right) - \frac{1}{2} \left( \Theta\_n^{(\epsilon)} + \pi\_n^{(\epsilon)}[\theta] \right) A X\_{l\_n}^{(\epsilon)} \Delta t \right\} \tag{62}$$

with Kalman gain

$$K\_n^{(\epsilon)} = \sigma\_n^{(\epsilon)} (AX\_{l\_n}^{(\epsilon)})^\mathrm{T} \left( \boldsymbol{\chi} + \Delta t \sigma\_n^{(\epsilon)} (AX\_{l\_n}^{(\epsilon)})^\mathrm{T} AX\_{l\_n}^{(\epsilon)} \right)^{-1} . \tag{63}$$

We also consider the appropriately modified scheme (30):

$$
\Theta\_{n+1}^{(\epsilon)} = \Theta\_n^{(\epsilon)} + \frac{\sigma\_n^{(\epsilon)}}{\mathcal{Y}} \int\_{l\_0}^{l\_{n+1}} (AX\_l^{(\epsilon)})^\mathcal{T} dX\_l^{(\epsilon)} - \frac{1}{2} K\_n^{(\epsilon)} AX\_{l\_n}^{(\epsilon)} \left(\Theta\_n^{(\epsilon)} + \pi\_n^{(\epsilon)}[\theta]\right) \Delta t. \tag{64}
$$

In order to understand the impact of the modified data generating process on the two mean-field EnKBF formulations (62) and (64), respectively, we follow [17] and investigate the difference between *X() <sup>t</sup>* and *<sup>X</sup>*† *t* :

**Fig. 1** SDE driven by mathematical vs. physical Brownian motion ( = 0*.*01). The top panel displays both *X*† *<sup>t</sup>* (blue) and *<sup>X</sup>() <sup>t</sup>* (red) over the long time interval *t* ∈ [0*,* 10], while the lower panel provides a zoomed in perspective over the interval *t* ∈ [0*,* 1]

$$\mathrm{d}(X\_{l}^{(\epsilon)} - X\_{l}^{\dagger}) = \mathrm{d}(X\_{l}^{(\epsilon)} - X\_{l}^{\dagger})\mathrm{d}t + \frac{\mathsf{y}^{1/2}}{\epsilon} \mathrm{d}P\_{l}^{(\epsilon)}\mathrm{d}t - \mathsf{y}^{1/2}\mathrm{d}W\_{l}^{\dagger} \tag{65a}$$

$$=A(X\_t^{(\epsilon)} - X\_t^\dagger)\mathbf{d}t - \boldsymbol{\chi}^{1/2}\mathbf{d}P\_t^{(\epsilon)}.\tag{65b}$$

When *P() <sup>t</sup>* is stationary, it is Gaussian with mean zero and covariance

$$\mathbb{E}\_{\text{stat}}\left[P\_t^{(\epsilon)} \otimes P\_t^{(\epsilon)}\right] = \epsilon \left(M + M^{\text{T}}\right)^{-1} = \frac{\epsilon}{2}I. \tag{66}$$

Hence *P() <sup>t</sup>* → 0 as → 0 and also

$$X\_t^{(\epsilon)} \to X\_t^\dagger \tag{67}$$

in *<sup>L</sup>*<sup>2</sup> uniformly in *<sup>t</sup>*, provided *σ (A)* <sup>⊂</sup> <sup>C</sup><sup>−</sup> and *<sup>X</sup>()* <sup>0</sup> <sup>=</sup> *<sup>X</sup>*† <sup>0</sup>. This is illustrated in Fig. 1.

In order to investigate the problem further, we study the integral

$$J\_{t\_n, t\_{n+1}}^{(\epsilon)} := \int\_{t\_n}^{t\_{n+1}} (AX\_I^{(\epsilon)})^\mathrm{T} \mathrm{d}X\_I^{(\epsilon)} \tag{68}$$

and its relation to (48). As for (48), we can rewrite (68) as

$$J\_{l\_{\hbar},l\_{\hbar+1}}^{(\epsilon)} = A^{\mathrm{T}} : (X\_{l\_{\hbar}}^{(\epsilon)} \otimes X\_{l\_{\hbar},l\_{\hbar+1}}^{(\epsilon)}) + A^{\mathrm{T}} : \mathbb{X}\_{l\_{\hbar},l\_{\hbar+1}}^{(\epsilon)}. \tag{69}$$

We now investigate the limit of the second-order iterated integral

$$\mathbb{X}\_{t\_{n},t\_{n+1}}^{(\epsilon)} = \int\_{t\_{n}}^{t\_{n+1}} X\_{t\_{n},l}^{(\epsilon)} \otimes \mathrm{d}X\_{l}^{(\epsilon)} \tag{70a}$$

Frequentist Perspective on Estimation Using the EnKF 251

$$=\frac{1}{2}X\_{t\_n,t\_{n+1}}^{(\epsilon)}\otimes X\_{t\_n,t\_{n+1}}^{(\epsilon)} - \frac{1}{2}\int\_{t\_n}^{t\_{n+1}}[X\_{t\_n,t}^{(\epsilon)}, \mathbf{d}X\_{t}^{(\epsilon)}] \tag{70b}$$

as → 0 [17]. Here [*., .*] denotes the commutator defined by (54). **Proposition 2** *The second-order iterated integral* X*() tn,tn*+<sup>1</sup> *satisfies*

$$\lim\_{\epsilon \to 0} \mathbb{X}\_{t\_n, t\_{n+1}}^{(\epsilon)} = \mathbb{X}\_{t\_n, t\_{n+1}}^{\dagger} + \frac{\Delta t \,\nu}{2} M \tag{71}$$

*Proof* The proof follows [17] and can be summarised as follows:

$$\mathbb{X}\_{t\_{\hbar},t\_{\hbar+1}}^{(\epsilon)} = \int\_{t\_{\hbar}}^{t\_{\hbar+1}} X\_{t\_{\hbar},t}^{(\epsilon)} \otimes \mathbb{dX}\_{\boldsymbol{I}}^{(\epsilon)} \tag{72a}$$

$$\rightarrow \int\_{I\_{\rm n}}^{I\_{\rm n+1}} X\_{I\_{\rm n},l}^{\dagger} \otimes \mathrm{d}X\_{l}^{\dagger} - \chi^{1/2} \int\_{I\_{\rm n}}^{I\_{\rm n+1}} X\_{I\_{\rm n},l}^{(\epsilon)} \otimes \mathrm{d}P\_{l}^{(\epsilon)} \tag{72b}$$

$$=\mathbb{X}\_{l\_{n},l\_{n+1}}^{\dagger} - \chi^{1/2} X\_{l\_{n},l\_{n+1}}^{(\epsilon)} \otimes P\_{l\_{n+1}}^{(\epsilon)} + \chi^{1/2} \int\_{l\_{\mathrm{I}}}^{l\_{\mathrm{I}}+1} \mathrm{d}X\_{\mathrm{I}}^{(\epsilon)} \otimes P\_{\mathrm{I}}^{(\epsilon)} \tag{72c}$$

$$\rightarrow \mathbb{X}\_{l\_n, l\_{n+1}}^{\dagger} + \gamma^{1/2} \int\_{l\_0}^{l\_{n+1}} \left\{ AX\_l^{(\epsilon)} + \frac{\mathcal{Y}^{1/2}}{\epsilon} MP\_l^{(\epsilon)} \right\} \otimes P\_l^{(\epsilon)} \, \mathrm{d}t \tag{72d}$$

$$\mathbb{E} \to \mathbb{X}\_{t\_n, t\_{n+1}}^{\dagger} + \frac{\Delta t \ \mathcal{Y}}{\epsilon} \mathcal{M} \operatorname{\mathbb{E}}\_{\text{stat}} \left[ P\_{t\_n}^{(\epsilon)} \otimes P\_{t\_n}^{(\epsilon)} \right] \tag{72e}$$

$$\mathbf{x} = \mathbb{X}\_{t\_n, t\_{n+1}}^{\dagger} + \frac{\Delta t \,\mathrm{y}}{2} M. \tag{726}$$

As discussed in detail in [9] already, Proposition 2 implies that the scheme (64) does not, in general, converge to the scheme (64) as → 0 since

$$J\_{l\_n, l\_{n+1}}^{\\\dagger} = \lim\_{\epsilon \to 0} J\_{l\_n, l\_{n+1}}^{(\epsilon)} - \frac{\Delta t \,\nu}{2} A^{\mathsf{T}} : \mathsf{M} \,. \tag{73}$$

This observation suggests the following modification

$$\Theta\_{n+1}^{(\epsilon)} = \Theta\_n^{(\epsilon)} + \frac{\sigma\_n^{(\epsilon)}}{\nu} \int\_{t\_n}^{t\_{n+1}} (AX\_l^{(\epsilon)})^\mathrm{T} \mathrm{d}X\_l^{(\epsilon)} - \frac{\Delta t}{2} \sigma\_n^{(\epsilon)} \, A^\mathrm{T} : M \,\mathrm{-}\tag{74a}$$

$$\frac{1}{2}K\_n^{(\epsilon)}AX\_{l\_n}^{(\epsilon)}\left(\Theta\_n^{(\epsilon)} + \pi\_n^{(\epsilon)}[\theta]\right)\Delta t\tag{74b}$$

to (64). Please note that it follows from (70) that

$$\int\_{I\_{\hbar}}^{I\_{\hbar+1}} (AX\_{I}^{(\epsilon)})^{\mathrm{T}} \mathrm{d}X\_{I}^{(\epsilon)} = A^{\mathrm{T}} : \left( X\_{I\_{\hbar+1/2}}^{(\epsilon)} \otimes X\_{I\_{\hbar}, I\_{\hbar+1}}^{(\epsilon)} - \frac{1}{2} \int\_{I\_{\hbar}}^{I\_{\hbar+1}} [X\_{I\_{\hbar}, I}^{(\epsilon)}, \mathrm{d}X\_{I}^{(\epsilon)}] \right) . \tag{75}$$

**Proposition 3** *The discrete-time EnKBF (62) converges to (20) for fixed Δt as* → 0*. Similarly, (74) converges to (30) under the same limit.*

*Proof* The first statement follows from *σ() <sup>n</sup>* = *σn*, the limiting behaviour (67), and

$$\lim\_{\epsilon \to 0} K\_n^{(\epsilon)} = K\_n. \tag{76}$$

The second statement additionally requires (73) to be substituted into (74) when taking the limit → 0.

*Remark 6* The analogous adaptation of (74) to the gradient descent formulation (19) with *X*† *<sup>t</sup>* replaced by *<sup>X</sup>() <sup>t</sup>* becomes

$$\theta\_{n+1}^{(\epsilon)} = \theta\_n^{(\epsilon)} + \frac{\alpha\_{l\_n}}{\mathcal{V}} \left( \int\_{l\_n}^{l\_{n+1}} (AX\_l^{(\epsilon)})^\mathrm{T} dX\_l^{(\epsilon)} - \frac{\mathcal{V}\Delta t}{2} A^\mathrm{T} : M - \right. \tag{77a}$$

$$\left(\theta\_n^{(\epsilon)}(AX\_{l\_n}^{(\epsilon)})^\mathrm{T}AX\_{l\_n}^{(\epsilon)}\Delta t\right). \tag{77b}$$

Alternatively, subsampling the data can be applied which leads to the simpler formulation

$$
\theta\_{n+1}^{(\epsilon)} = \theta\_n^{(\epsilon)} + \frac{\alpha\_{l\_n}}{\nu} (AX\_{l\_n}^{(\epsilon)})^\mathrm{T} \left( (X\_{l\_{n+1}}^{(\epsilon)} - X\_{l\_n}^{(\epsilon)}) - \theta\_n^{(\epsilon)} AX\_{l\_n}^{(\epsilon)} \Delta t \right). \tag{78}
$$

*Remark 7* A two-scale SDE, closely related to (58), has been investigated in [8] in terms of the time integrated autocorrelation function of *P() <sup>t</sup>* and modified stochastic integrals. In our case, the modified quadrature rule, here denoted by ', has to satisfy

$$\int\_{t\_{\mathrm{fl}}}^{t\_{\mathrm{fl}+1}} (AX\_{I}^{\dagger})^{\mathrm{T}} \diamond \mathrm{d}X\_{I}^{\dagger} = \lim\_{\epsilon \to 0} \int\_{t\_{\mathrm{fl}}}^{t\_{\mathrm{fl}+1}} (AX\_{I}^{(\epsilon)})^{\mathrm{T}} \mathrm{d}X\_{I}^{(\epsilon)},\tag{79}$$

and it is therefore related to the standard Itô integral via

$$\int\_{t\_0}^{t\_{h+1}} (AX\_{l}^{\dagger})^{\mathrm{T}} \diamond \mathrm{d}X\_{l}^{\dagger} = \int\_{t\_0}^{t\_{h+1}} (AX\_{l}^{\dagger})^{\mathrm{T}} \mathrm{d}X\_{l}^{\dagger} + \frac{\Delta t \mathcal{Y}}{2} \mathrm{A}^{\mathrm{T}} : \mathcal{M}. \tag{80}$$

Hence *M* playes the role of the integrated autocorrelation function of *P() <sup>t</sup>* in our approach. We note that the modified quadrature rule reduces to the standard Stratonovitch integral if either *β* = 0 in (59) or *A* is symmetric. While the results from [8] could, therefore, also be used as a starting point for discussing the induced estimation bias, practical implementations would still require knowledge of the integrated autocorrelation function of *P() <sup>t</sup>* or, equivalently, the estimation of *M* in addition to observing *X() <sup>t</sup>* . We address this aspect next.

The numerical implementation of (74) requires an estimator for the generally unknown *M* in (73). This task is challenging as we only have access to *X() t* without any explicit knowledge of the underlying generating process (58). While the estimator proposed in [9] is based on the idea of subsampling the data, the frequentist perspective taken in this note suggests the alternative estimator *M*est defined by

$$\frac{\Delta t \,\,\nu}{2} M\_{\text{est}} = \mathbb{E}^{\dagger} [\mathbb{X}\_{l\_n, l\_{n+1}}^{(\epsilon)}],\tag{81}$$

which follows from (72f) and (52). That is, <sup>E</sup>†[X† *tn,tn*+<sup>1</sup> ] = <sup>O</sup>*(Δt*2*)* for *Δt* sufficiently small. Note that second-order iterated integral *X() tn,tn*+<sup>1</sup> satisfies (70) and is therefore easy to compute. In practice, the frequentist expectation value can be replaced by an approximation along a given single observation path *X() <sup>t</sup>* , *t* ∈ [0*, T* ], under the assumption of ergodicity.

An appropriate choice of the outer or sub-sampling step-size *Δt* [27] constitutes an important aspect for the practical implementation of the EnKBF formulation (62) for finite values of  *>* 0 [26]. Consistency of the second-order iterated integrals [13] implies

$$\mathbb{X}\_{t\_n, t\_{n+2}}^{(\epsilon)} = \mathbb{X}\_{t\_n, t\_{n+1}}^{(\epsilon)} + \mathbb{X}\_{t\_{n+1}, t\_{n+2}}^{(\epsilon)} + X\_{t\_n, t\_{n+1}}^{(\epsilon)} \otimes X\_{t\_{n+1}, t\_{n+2}}^{(\epsilon)}.\tag{82}$$

A sensible choice of *Δt* is dictated by

$$\mathbb{E}^{\dagger} \left[ X\_{l\_{\hbar}, l\_{\hbar+1}}^{(\epsilon)} \otimes X\_{l\_{\hbar+1}, l\_{\hbar+2}}^{(\epsilon)} \right] = \mathcal{O}(\Delta t^2) \,, \tag{83}$$

that is, the sub-sampled data *X() tn* behaves to leading order like solution increments from the reference model (2) at scale *Δt* independent of the specific value of . Note that, on the other hand,

$$\mathbb{E}^{\dagger} \left[ X\_{\mathbf{r}\_l, \mathbf{r}\_{l+1}}^{(\epsilon)} \otimes X\_{\mathbf{r}\_{l+1}, \mathbf{r}\_{l+2}}^{(\epsilon)} \right] = \mathcal{O}(\epsilon^{-1} \Delta \tau^2) \tag{84}$$

for an inner step-size *Δτ* ∼ . In other words, a suitable step-size *Δt >* 0 can be defined by making

$$h(\Delta t) := \Delta t^{-2} \left\| \mathbb{E}^{\dagger} \left[ X\_{I\_{\mathcal{U}}, I\_{n+1}}^{(\epsilon)} \otimes X\_{I\_{n+1}, I\_{n+2}}^{(\epsilon)} \right] \right\| \tag{85}$$

as small as possible while still guaranteeing an accurate numerical approximation in (62).

*Remark 8* The choice of the outer time step *Δt* is less critical for the EnKBF formulation (74) since it does not rely on sub-sampling the data and is robust with regard to perturbations in the data provided the appropriate *M* is explicitly available or has been estimated from the available data using (81). Furthermore, if *A* is symmetric, then it follows from (75) and the skew-symmetry of the commutator [*., .*] that

$$\int\_{l\_{\hbar}}^{l\_{\hbar+1}} (AX\_{l}^{(\epsilon)})^{\mathrm{T}} \mathrm{d}X\_{l}^{(\epsilon)} = A : \left( X\_{l\_{\hbar+1/2}}^{(\epsilon)} \otimes X\_{l\_{\hbar}, l\_{\hbar+1}}^{(\epsilon)} \right), \tag{86}$$

which can be used in (74). The same simplification arises when *M* is symmetric. This insight is at the heart of the geometric rough path approach followed in [9] and which starts from the Stratonovich formulation (25) of the EnKBF. See also [28] on the convergence of Wong–Zakai approximations for stochastic differential equations. In all other cases, a more refined numerical approximation of the datadriven integral in (74) is necessary; such as, for example, (31). For that reason, we rely on the Itô/Euler–Maruyama interpretation of (68) in this note instead, that is the approximation (12).

#### **5 Numerical Example**

We consider the linear SDE (2) with *γ* = 1 and

$$A = \frac{-1}{2} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix}. \tag{87}$$

We find that *<sup>C</sup>* <sup>=</sup> *<sup>I</sup>* and *<sup>A</sup>*T*<sup>A</sup>* <sup>=</sup> <sup>1</sup>*/*2*<sup>I</sup>* . Hence *(A*T*A)* : *<sup>C</sup>* <sup>=</sup> 1, and the posterior variance simply satisfies *σt* = *σ*0*/(*1 + *σ*0*t)* according to (44). We set *m*prior = 0 and *σ*prior = 4 for the Gaussian prior distribution of *Θ*0, and the observation interval is [0*, T* ] with *T* = 6. We find that *σT* = 0*.*16. Solving (39) for given *σt* with initial condition *m*<sup>0</sup> = 0 yields

$$m\_l = 1 - \frac{\sigma\_l}{\sigma\_0} \tag{88}$$

and *mT* = 0*.*96. The corresponding curves are displayed in red in Fig. 2.

We implement the EnKBF schemes (20) and (30) with *tn* = *n Δt*. The inner time-step is *Δτ* <sup>=</sup> <sup>10</sup>−<sup>4</sup> while *Δt* <sup>=</sup> <sup>0</sup>*.*06, that is, *<sup>L</sup>* <sup>=</sup> 600. We repeat the experiment *<sup>N</sup>* <sup>=</sup> 104 times and compare the outcome with the predicted mean value of *mT* = 0*.*96 and the posterior variance of *σT* = 0*.*16 in Fig. 2. The differences in the computed time evolutions of *mt* and *pt* are rather minor and support the idea that it is not necessary to assimilate continuous-time data beyond *Δt*. We

**Fig. 2** (**a**–**b**) Frequentist mean, *mt* and variance, *pt* , from EnKBF implementation (20) with stepsize *Δt* = 0*.*06; (**c**–**d**) Same results from EnKBF implementation (30) with inner time-step *Δτ* = *Δt/*600. We also display the curves arising for *σt* and *mt* from the standard Kalman theory using the approximation (22). Note that the posterior variance, *σt* , should provide an upper bound on the frequentist uncertainty *pt*

**Fig. 3** Same experimental setting as in Fig. 2 but with the data now generated from the multi-scale SDE (58). Again, subsampling the data in intervals of *Δt* = 0*.*06 and high-frequency assimilation with step-size *Δτ* <sup>=</sup> <sup>10</sup>−<sup>4</sup> lead to very similar results in terms of their frequentist means and variances

also find that the simple prediction (88), based on standard Kalman filter theory, is not very accurate for this low-dimensional problem (*d* = 2). The corresponding approximation for *σt* provides, however, a good upper bound for *pt* .

We now replace the data generating SDE model (2) by the multi-scale formulation (58) with = 0*.*01 and *β* = 2. This parameter choice agrees with the one used in [9]. We again find that assimilating the data at the slow time-scale *Δt* = 0*.*06 leads to very similar results obtained from an assimilation at the fast time-scale *Δτ* <sup>=</sup> <sup>10</sup>−<sup>4</sup> with the EnKBF formulation (74), provided the correction term resulting from the second-order iterated integral (73) is included (See Fig. 3). We also verified numerically that *Δt* = 0*.*06 constitutes a nearly optimal step-size in the sense of making (85) sufficiently small while maintaining numerical accuracy. For example, reducing the outer step-size to *Δt* = 0*.*02 leads to *h(*0*.*02*)*−*h(*0*.*06*)* ≈ 10 in (85).

#### **6 Conclusions**

In this follow-up note to [9], we have investigated the impact of subsampling and/or high-frequency data assimilation on the corresponding conditional mean estimators, *μt* , both for data generated from the standard SDE model and a modified multi-scale SDE. A frequentist analysis supports the basic finding that both approaches lead to comparable results provided that the systematic biases due to different second-order iterated integrals are properly accounted for. While the EnKBF is relatively easy to analyse and a full rough path approach can be avoided, extending these results to the nonlinear feedback particle filter [26, 9] will prove more challenging. Extensions to systems without a strong scale separation [4, 31] and applications to geophysical fluid dynamics [22, 12] are also of interest. In this context, the approximation quality of the proposed estimator (81) and the choice of the step-size *Δt* following (85) (and potentially *Δτ* ) will be of particular interest. Finally, while we have investigated the univariate parameter estimation problem, a semi-parametric parametrisation of the drift term *f* in (1), such as random feature maps [21], lead to high-dimensional parameter estimation problems and their statistics [19, 20]. This provides another fertile direction for future research.

**Acknowledgments** SR has been partially funded by Deutsche Forschungsgemeinschaft (DFG)— Project-ID 318763901—SFB1294 and Project-ID 235221301—SFB1114. He would also like to thank Nikolas Nüsken for many fruitful discussions on the subject of this paper.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Random Ocean Swell-Rays: A Stochastic Framework**

**Valentin Resseguier, Erwan Hascoët, and Bertrand Chapron**

#### **1 Introduction**

Originating from distant storms, swell systems radiate across all ocean basins (Snodgrass et al., 1966; Collard et al., 2009; Ardhuin et al., 2009). Far from their sources, emerging surface waves have low steepness characteristics, with very slow amplitude variations. Swell propagation then closely follows principles of geometrical optics, i.e. the eikonal approximation to the wave equation, with a constant wave period along geodesics, when following a wave packet at its group velocity. The phase averaged evolution of quasi-linear wave fields is then dominated by interactions with underlying current and/or topography changes (Phillips, 1977). Comparable to the propagation of light in a slowly varying medium, over many wavelengths, cumulative effects can lead to refraction, i.e. change of the direction of propagation of a given wave packet, so that it departs from its initial ray-propagation direction. This opens the possibility of using surface swell systems as probes to estimate turbulence along their propagating path.

For a single progressive swell wave train, a description of the form

$$h(\mathbf{x}, t) = a(\mathbf{x}, t)e^{l\phi(\mathbf{x}, t)},\tag{1}$$

V. Resseguier (-)

Lab, SCALIAN DS, Rennes, France e-mail: valentin.resseguier@scalian.com https://sites.google.com/view/valentinresseguier

E. Hascoët OceanDataLab, Locmaria-Plouzané, France

B. Chapron Laboratoire d'Océanographie Physique et Spatiale (LOPS), Ifremer, Plouzané, France

is locally possible for most wave properties, i.e. the surface elevation, slope, orbital velocities. If the wave-ray propagation is to be followed, or predicted, the phase, *φ(x,t)*, must vary smoothly along the wave's path. Mathematically, *φ(x,t)* is required to be differentiable, to define the relative frequency

$$
\omega = -\partial\_l \phi(\mathbf{x}, t),
\tag{2}
$$

and the wave number vector

$$k = \nabla \phi(\mathbf{x}, t). \tag{3}$$

These partial derivatives of *φ(x,t)* being independent of the differentiation order, the kinematical conservation equation for the density of waves writes

$$-\nabla \omega = \partial\_l \mathbf{k},\tag{4}$$

with the irrotational condition

$$
\nabla \times \mathbf{k} = 0,\tag{5}
$$

to serve as an initial condition for use with Kelvin's circulation theorem. The rate of change of the wave-number is balanced by the convergence of the frequency, the number of wave crests passing a fixed point.

Let us now consider an ocean moving with velocity *v*, slowly varying with respect to time and space. The frequency of wave crests passing a fixed point, i.e. the apparent frequency, becomes

$$
\omega = \omega\_0 + \mathfrak{v} \cdot \mathbf{k}, \tag{6}
$$

with *ω*<sup>0</sup> = *f (k,H)*, *H* the depth, the intrinsic frequency, whose functional dependence on *k* is known. For gravity waves, this dispersion relationship is

$$
\omega\_0 = \sqrt{g \|\boldsymbol{k}\| \tanh \|\boldsymbol{k}\| \|\boldsymbol{H}},\tag{7}
$$

and thus

$$
\partial\_t \mathbf{k} + \partial\_k \omega\_0 \nabla \mathbf{k} + \partial\_H \omega\_0 \nabla H + I \cdot \mathbf{v} \nabla \|\mathbf{k}\| + \|\mathbf{k}\| \nabla (l \cdot \mathbf{v}) = 0,\tag{8}
$$

with *l* is a unit vector in the direction of *k* and *k* = *k*. Consequently, for a steady wave train, the variation of the wave-number magnitude along the propagation *s* is

$$\left\|\partial\_{\boldsymbol{s}}\|\boldsymbol{k}\right\| = -\left(c\_{\mathcal{S}} + \boldsymbol{l}\cdot\boldsymbol{\mathfrak{v}}\right)^{-1} [\partial\_{H}\boldsymbol{\omega}\_{0}\partial\_{\boldsymbol{s}}H + \|\boldsymbol{k}\|\partial\_{\boldsymbol{s}}(\boldsymbol{l}\cdot\boldsymbol{\mathfrak{v}})],\tag{9}$$

with *cg* = *∂kω*0, the local group velocity. Using the irrotational condition, the evolution of the ray direction, *θ (s)*, follows

$$\partial\_{\boldsymbol{s}}\boldsymbol{\theta} = -(\boldsymbol{c}\_{\mathcal{S}} + \boldsymbol{I} \cdot \boldsymbol{\mathfrak{v}})^{-1} [\frac{1}{\|\boldsymbol{k}\|} \partial\_{\boldsymbol{H}} \boldsymbol{\alpha}\_{\mathcal{O}} \partial\_{\boldsymbol{\nu}} \boldsymbol{H} + \partial\_{\boldsymbol{\nu}} (\boldsymbol{I} \cdot \boldsymbol{\mathfrak{v}})],\tag{10}$$

where *ν* is unit vector normal to the direction of the ray. Accordingly, wave trajectories will bend with depth variations. For deep water, the dispersion relationship reduces to *<sup>ω</sup>*<sup>0</sup> <sup>=</sup> *gk*, and *θ (s)* solely depends upon the ratio between the cross-ray current gradient and the local group velocity. More generally, this result extends to the ray curvature, being to first order controlled by *ζ /cg*, the ratio between *ζ* = **∇** × *v*, the vertical component of the current vorticity, and *cg* = *∂kω*<sup>0</sup> = *ω/*2*k*, the group velocity. Accordingly, the rays will bend in the direction of decreasing (increasing) current speed. Moreover, a potential velocity field will give little refraction. Yet, a potential velocity field will control the variation of the wave-number magnitude, and thus the group velocity and bending, along the propagation.

To specify the local linear wave propagation, a precise knowledge of the surface currents, local gradients and/or vorticity, thus appears essential. In a realistic numerical setting, Ardhuin et al. (2017) clearly demonstrated that wave energy variations would largely be dominated by the effects of ocean currents at scales of about 10–100 km. From altimeter ocean surface wave energy measurements, Quilfen and Chapron (2019) also showed that mesoscale and sub-mesoscale upper ocean circulation can drive a significant part of the wave variability in the coupled oceanatmosphere system. Unfortunately, these small-scale currents are not observed and certainly not resolved in operational models. Today, a precise spatio-temporal information is thus largely missing. To overcome these observation difficulties, but to best take into account unresolved small-scale currents, a stochastic framework can be adopted. Such a stochastic model shall then provide means to perform fast simulations and test ensembles of wave-propagation predictions, to best evaluate impacts of underlying near-surface small-scale currents on the evolution of ocean surface swell systems.

#### **2 Random Swell-Rays**

To first order in wave steepness, the group velocity *v<sup>g</sup>* is modified by the local velocity of the currents *v*,

$$\frac{d\mathbf{x}}{dt} = \mathbf{v}\_{\mathcal{S}} = \nabla\_{\mathcal{k}} \boldsymbol{\omega} = \underbrace{\nabla\_{\mathcal{k}} \boldsymbol{\omega}\_{0}(\mathbf{k})}\_{\begin{subarray}{c} \text{Group velocity} \\ \text{without currents} \\ \text{but changing wave vector} \end{subarray}} + \boldsymbol{\uppi}, \tag{11}$$

where *x* is the centroid of a wave group. The ray direction can thus differ from the direction of the wave vector, except in the case of parallel wave and current directions. Unlike depth refraction, the crest alignment does not indicate the wave propagation direction. The coupled wave vector evolution writes

262 V. Resseguier et al.

$$\frac{d\mathbf{k}}{dt} = -\nabla \mathbf{v}^T \mathbf{k}.\tag{12}$$

Along the propagation ray, velocity gradients induce linear variations. Decelerating currents will shorten waves, and thus reduce the group velocity. The validity of this coupled ray approximation largely depends on the condition *kξ* ! 1, where *ξ* is a length scale on which the current field is varying, physically corresponding to the typical eddy size. This condition is well satisfied for wave numbers of interest, of order *k* ∼ <sup>2</sup>*π/*250 rad.m−1, and typical eddy size *<sup>ξ</sup>* <sup>∼</sup> 5 km or larger. Scattering of the waves by currents can further be assumed to be weak, with *v* of order 0.5 m/s, much smaller than *vg* of order 10 m/s. Subsequently, each ray will be appreciably deflected, with scattering angle of order ∼*v/vg* after traveling a typical correlation length ∼*ξ* along the mean wave vector direction.

To complete the wave field description, the wave action *A(x,t)* is considered to be an adiabatic invariant. Wave action is crucial to anticipate wave transformations by currents (White and Fornberg, 1998). This action is the integral of the action spectrum *N (x, k,t)* over all the wave-vectors *k*:

$$A(\mathbf{x},t) = \int d\mathbf{k} \, N(\mathbf{x}, \mathbf{k}, t). \tag{13}$$

The wave action spectrum *N* is the action by unit of surface (unit of *x*) and by unit of wave-vector surface (unit of *k*). For linear waves, the wave action spectrum is simply related to the wave energy spectrum *E*:

$$E(\mathbf{x}, \mathbf{k}, t) = N(\mathbf{x}, \mathbf{k}, t) \,\,\omega\_0(\mathbf{k}). \tag{14}$$

By the Liouville theorem, the *(x, k)* space does not contract nor dilate along time1 Since the dissipation is neglected, the wave action spectrum *N* is thus conserved (Lavrenov, 2013), i.e.

$$N\left(\mathbf{x}(t\_l), \mathbf{k}(t\_l), t\_l\right) = N\left(\mathbf{x}(t\_f), \mathbf{k}(t\_f), t\_f\right),\tag{15}$$

along the following *(x, k)* variable change between initial time *ti* and the final time *tf* :

$$
\begin{pmatrix} \mathbf{x}(t\_l) \\ \mathbf{k}(t\_l) \end{pmatrix} \mapsto \begin{pmatrix} \mathbf{x}(t\_f) \\ \mathbf{k}(t\_f) \end{pmatrix}. \tag{16}
$$

$$\mathbf{v}\_1 \begin{bmatrix} \nabla\_x \\ \nabla\_k \end{bmatrix} \cdot \left( \frac{d}{dt} \begin{bmatrix} \mathbf{x} \\ \mathbf{k} \end{bmatrix} \right) = \begin{bmatrix} \nabla\_x \\ \nabla\_k \end{bmatrix} \cdot \left( \begin{bmatrix} \mathbf{v} \\ -\nabla\_x \mathbf{v}^T \mathbf{k} \end{bmatrix} \right) = \nabla\_x \cdot \mathbf{v} - \nabla\_x \cdot \mathbf{v} = 0.1$$

Subsequently, each Fourier mode of a swell wave train can be modified, independently of the others. In absence of source terms, the action spectrum conservation (15) then writes:

$$\frac{dN}{dt} = \partial\_l N + \boldsymbol{\nu}\_\mathcal{g} \cdot \nabla\_\mathbf{x} N + \left(-\nabla\_\mathbf{x} \boldsymbol{\nu}^T \boldsymbol{k}\right) \cdot \nabla\_\mathbf{k} N = 0. \tag{17}$$

#### **3 The Time-Decorrelation Assumption**

Now, the Eulerian current *v* is decomposed into a large-scale component *v* and a small-scale unresolved component *v* :

$$
\overline{v} = \overline{v} + \overline{v}'.\tag{18}
$$

In a stochastic framework, we can work with the Stratonovich notations (Oksendal, 1998; Kunita, 1997). Under Stratonovich calculus rules, expressions become similar to deterministic ones. The Stratonovich dispersion relation is analogous to the deterministic one (6). The method of characteristics is also valid, (11), (12), and (15), with *v* defined by *σ* ◦d*Bt /dt*, where d*Bt /dt* is a spatio-temporal white noise and *σ*◦ denotes a spatial filter which encodes spatial correlations and horizontal incompressibility (**∇ ·** *σ* = 0). For a spatially stationary and isotropic small-scale velocity, the wave characteristic dynamics equations (11), (12) and (15) would then also remain the same with Ito notations (i.e. we can replace *σ* ◦ d*Bt* by *σ*d*Bt* to derive the evolution). With Ito notations, the action spectrum conservation (17) writes

$$
\partial\_l N + \boldsymbol{\nu}\_\mathcal{g} \cdot \nabla\_\mathbf{x} N + \left(-\nabla\_\mathbf{x} \boldsymbol{\nu}^T k\right) \cdot \nabla\_k N = \begin{bmatrix} \nabla\_\mathbf{x} \\ \nabla\_\mathbf{k} \end{bmatrix} \cdot \left(\mathcal{D} \begin{bmatrix} \nabla\_\mathbf{x} \\ \nabla\_\mathbf{k} \end{bmatrix} N\right), \qquad (19)
$$

where *v<sup>g</sup>* and *v* include the random small-scale component *v* = *σ*d*Bt /dt*, and

$$D = \frac{1}{2dt} \mathbb{E}\left\{ \begin{bmatrix} \sigma \mathbf{d} B\_l \\ -\nabla\_\mathbf{x} (\sigma \mathbf{d} B\_l)^T \mathbf{k} \end{bmatrix} \begin{bmatrix} \sigma \mathbf{d} B\_l \\ -\nabla\_\mathbf{x} (\sigma \mathbf{d} B\_l)^T \mathbf{k} \end{bmatrix}^T \right\}. \tag{20}$$

Compared to (17), a RHS diffusive term appears, likely acting to increase the initial directional spread of the incident very directional swell components.

Voronovich (1991) and White and Fornberg (1998) discussed the joint random evolution changes of the coupled *(x, k)*, i.e. the location and the wave vector of waves, subject to a random current *v*. Considering the wave train to undergo slow changes over the typical time to travel through the typical correlation length of the underlying current, the joint time evolution of *(x, k)* can be approximated to be driven by a diffusion Markov process.

#### *3.1 The Ray Lagrangian Correlation Time*

To apply (19), the covariance of the small-scale unresolved component *v* – in the wave group frame – is thus to be assessed:

$$\boldsymbol{\gamma}\_{\boldsymbol{v}'}^{X\_{r}}(t) = \mathbb{E}\left(\mathbf{v}'(t', X\_{r}(t')) \cdot \mathbf{v}'(t' + t, X\_{r}(t' + t))\right) = \boldsymbol{\gamma}\_{\boldsymbol{v}'}(t, X\_{r}(t' + t) - X\_{r}(t')), \tag{21}$$

where *γv* is the (Eulerian) spatio-temporal covariance of *v* , assuming statistical homogeneity, and stationarity for *v* . Assume a typical isotropic form for this covariance:

$$\gamma\_{v'}(t, \mathbf{x}) = \chi \left( \frac{|t|}{\tau\_{v'}} + \frac{\|\mathbf{x}\|}{l\_{v'}} \right), \tag{22}$$

then,

$$\gamma\_{v'}^{X\_r}(t) = \chi\left(\frac{|t|}{\tau\_{v'}} + \frac{\|X\_r(t'+t) - X\_{r'}(t')\|}{l\_{v'}}\right) = \chi\left(\left(\frac{1}{\tau\_{v'}} + \frac{\|\mathbf{v}\_{\mathcal{S}}\|}{l\_{v'}}\right)|t| + O(t^2)\right),\tag{23}$$

for small time increment *<sup>t</sup>*. Therefore, <sup>1</sup> *τv* <sup>+</sup> *vg l v* −1 is the correlation time of *v (t, Xr(t))*. The same derivation is valid for **∇***(v ) <sup>T</sup> (t, Xr(t))*. Over deep ocean, the swell wave group velocity is *v*<sup>0</sup> *<sup>g</sup>*=**∇***kω*0 = <sup>1</sup> 2 \* *g k* , and the along-ray correlation time of the small-scale velocity can be approximated by *lv/v*<sup>0</sup> *<sup>g</sup>*. The ratio between this along-ray correlation time and the characteristic time of the wave group properties evolution, will then control the time decorrelation assumption of *v* :

$$\epsilon = \frac{l\_{v'}}{\|\boldsymbol{v}\_g^0\|} \|\boldsymbol{\nabla}\boldsymbol{v}^T\|. \tag{24}$$

Note the Eulerian small-scale velocity *v* is not necessarily time uncorrelated. Yet, for small enough , the Lagrangian small-scale velocity along the ray can be considered time uncorrelated. From the expression of , such a condition depends upon:


#### *3.2 Ray Absolute Diffusivity*

The absolute diffusivity (or Kubo-type formula) usually corresponds, in the socalled diffusive regime, to the variance per unit of time of a fluid particle Lagrangian path *<sup>d</sup><sup>X</sup> dt* = *v*. It is approximately equal to the velocity variance times its correlation time. The Eulerian velocity covariance (22) will thus induce an absolute diffusivity

$$a = \int\_0^\infty dt \,\,\chi\_{\upsilon'}(t, X(t' + t) - X(t')) \approx \chi(0) \,\,\tau\_{\upsilon'}.\tag{25}$$

Here, a wave group is followed along its propagation, and a ray absolute diffusivity slightly differs from the usual absolute diffusivity to become

$$a^{X\_{\boldsymbol{v}}} = \int\_0^\infty dt \,\,\nu^{X\_{\boldsymbol{v}}}\_{\boldsymbol{v}'}(t) \approx \left(\frac{1}{\mathfrak{r}\_{\boldsymbol{v}'}} + \frac{\|\mathfrak{v}\_{\boldsymbol{g}}\|}{l\_{\boldsymbol{v}'}}\right)^{-1} \boldsymbol{\nu}(0) \approx \frac{l\_{\boldsymbol{v}'}}{\|\mathfrak{v}\_{\boldsymbol{g}}^0\|} \,\,\boldsymbol{\nu}(0). \tag{26}$$

In the Fourier space, the current Absolute Diffusivity Spectral Densisty (ADSD) (Resseguier et al., 2020) associated with the wave dynamics is defined by

$$A^{X\_r}(k) = \frac{1/k}{\|\boldsymbol{v}\_{\mathcal{S}}^0(\mathbf{k}^{X\_r})\|} \ E\_k(k),\tag{27}$$

where *kXr* denotes the wave wave-vector, *k* the current wave number and *Ek* the current kinetic energy spectra. Accordingly, for noise calibration, we assume *AXr* self-similar and we choose a divergence-free spatial filter **∇**<sup>⊥</sup>*ψσ* such that *v* = *σ*d*Bt /dt* = **∇**⊥*ψ*˘*<sup>σ</sup> \** d*Bt /dt* and *σ* d*Bt(k)*2*/dt* = |*<sup>k</sup> <sup>ψ</sup>* A˘*<sup>σ</sup> (k)*| <sup>2</sup> <sup>=</sup> *AXr <sup>v</sup> (k)*.

#### *3.3 A Practical Estimation*

To simplify (20), let us consider the solution for an homogeneous and isotropic small-scale velocity *v* = *σ*d*Bt /dt* = **∇**⊥*ψ*˘*<sup>σ</sup> \** d*Bt /dt* and Matérn stream function covariance, *(ψ*˘*<sup>σ</sup>* ∗ *ψ*˘*<sup>σ</sup> )*, leading to

$$\mathbf{D} = \frac{1}{2dt} \begin{bmatrix} \mathbb{E} \left\{ (\mathbf{\sigma} \mathbf{d} \mathbf{B}\_l) (\mathbf{\sigma} \mathbf{d} \mathbf{B}\_l)^T \right\} & 0 & 0 \\\\ 0 & 0 & \sum\_{ij=1}^2 k\_i k\_j \operatorname{\mathbb{E}} \left\{ (\mathbf{\nabla}\_{\mathbf{x}} (\mathbf{\sigma} \mathbf{d} \mathbf{B}\_l)\_i) (\mathbf{\nabla}\_{\mathbf{x}} (\mathbf{\sigma} \mathbf{d} \mathbf{B}\_l)\_j)^T \right\} \\\\ \begin{bmatrix} \frac{a\_0}{2} \mathbb{I}\_d & 0 & 0 \\\\ 0 & 0 & 0 \\\\ 0 & \frac{c\_{\kappa\_M}}{2} \left( \mathbf{k} \mathbf{k}^T + 3 \mathbf{k}^\perp \left( \mathbf{k}^\perp \right)^T \right) \end{bmatrix}, \tag{29}$$

where *<sup>a</sup>*<sup>0</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup>*dt* <sup>E</sup>*σ*d*Bt*<sup>2</sup> and *cκM* <sup>=</sup> <sup>1</sup> <sup>8</sup>*dt* <sup>E</sup>∇*x(σ*d*Bt)<sup>T</sup>* <sup>2</sup> are constants depending on both the correlation length and the spectrum slope of the small-scale velocity. The Ito action spectrum equation (19) then reads:

$$\begin{aligned} \partial\_t N + \boldsymbol{\nu}\_\mathcal{g} \cdot \nabla\_\mathbf{x} N &+ \left(-\nabla\_\mathbf{x} \boldsymbol{\nu}^T \boldsymbol{k}\right) \cdot \nabla\_\mathbf{k} N \\ \nabla\_\mathbf{x} &= \nabla\_\mathbf{x} \cdot \left(\frac{1}{2} a\_0 \nabla\_\mathbf{x} N\right) + \nabla\_\mathbf{k} \cdot \left(\frac{1}{2} c\_{\kappa\_M} \left[\boldsymbol{k}\boldsymbol{k}^T + 3\boldsymbol{k}^\perp \left(\boldsymbol{k}^\perp\right)^T\right] \nabla\_\mathbf{k} N\right), \end{aligned} \tag{30}$$

$$\mathcal{L} = \frac{1}{2} a\_0 \Delta\_\mathbf{x} N + \frac{1}{2} c\_{\kappa\_M} \frac{1}{\|\mathbf{k}\|} \partial\_{\parallel \mathbf{k}} \left( \|\mathbf{k}\|^3 \partial\_{\parallel \mathbf{k}} N \right) + \mathfrak{J}\_{\frac{1}{2}}^1 c\_{\kappa\_M} \partial\_{\theta\_\mathbf{k}}^2 N. \tag{31}$$

The ensemble mean then follows:

$$\begin{split} & \partial\_t \mathbb{E}N + \overline{\boldsymbol{\sigma}}\_{\boldsymbol{\xi}} \cdot \nabla\_{\boldsymbol{x}} \mathbb{E}N + \left(-\nabla\_{\mathbf{x}} \overline{\boldsymbol{v}}^T \boldsymbol{k}\right) \cdot \nabla\_{\boldsymbol{k}} \mathbb{E}N \\ &= \frac{1}{2} a\_0 \Delta\_{\mathbf{x}} \mathbb{E}N + \frac{1}{2} c\_{\kappa\_M} \frac{1}{\|\boldsymbol{k}\|} \partial\_{\|\boldsymbol{k}\|} \left(\left\|\boldsymbol{k}\right\|^3 \partial\_{\|\boldsymbol{k}\|} \mathbb{E}N\right) + \mathfrak{Z}\_{\mathbb{Z}}^{\frac{1}{2}} c\_{\kappa\_M} \partial\_{\theta\_{\mathbf{k}}}^2 \mathbb{E}N, \end{split} \tag{32}$$

This last RHS diffusion term along the ray-direction *θ* is then reminiscent to Eq. 3.16 in Bôas and Young (2020) and Eq. 36 in Smit and Janssen (2019) derived under the same isotropic and homogeneous turbulence assumptions.

#### **4 Numerical Simulations**

To illustrate our purpose, we consider the Surface Quasi-Geostrophic dynamics (Pierrehumbert, 1994; Lapeyre, 2017), abbreviated SQG:

$$\sigma(\partial\_l + \mathfrak{v} \cdot \nabla) \left( -\frac{b}{N} \right) = 0 \text{ with } \mathfrak{v} = \mathfrak{v}\_{\text{SQG}} = -\nabla^\perp (-\Delta)^{-1/2} \left( -\frac{b}{N} \right). \tag{33}$$

Note, real-upper-ocean currents may not strictly follow SQG. Still, after a wind burst, it can be a good approximation at many mid-latitude locations. SQG corresponds to dynamics with extreme locality, i.e a KE spectrum with a shallow slope <sup>−</sup>5*/*3. Hence, for fixed KE value, a larger current gradient **∇***v<sup>T</sup>* is expected. The validity of the time-decorrelation assumption of Sect. 3 will then depend upon the scale separation, defining the correlation length of the unresolved scales.

A reference simulation is obtained at a resolution 512 × 512 for a 1000-km squared domain, through a pseudo-spectral code (Resseguier et al., 2017, 2020).

Once initialized, the current velocity *v* is about 0*.*1 m.s−1.

A swell system enters the southern boundary, propagating to the north. The carrier incident wave has a wave length *λ* = 250 m. Its envelope is Gaussian with an isotropic spatial extension of 30*λ*. Figure 1 illustrates the branched regime **Fig. 1** Swell interacting with a high-resolution (512 × 512) deterministic SQG current. The left panel shows ray trajectories computed by forward advection and superimposed on the current vorticity *ω* = **∇**<sup>⊥</sup> **·** *v*. The right panel shows bidirectional wave spectra, computed by backward advection, at 8 locations along a meridional axis (the mean wave propagation direction)

in this homogeneous SQG turbulence. This regime spreads the positions (left panel) and wavevectors (right panel) of the incoming waves. From south to north, spectral diffusion occurs (right panel), in the direction orthogonal (here *kx* ) to the propagation (here *ky* ). This accelerates – along the propagation – the zonal wave position spread, to create the branched regime visible in the left panel. This acceleration is explained by the ray equation (11) dominated by the intrinsic wave group velocity **<sup>∇</sup>***kω*<sup>0</sup> <sup>=</sup> **∇***kω*0 *k <sup>k</sup>*.

To mimic a badly resolved *v*, the current *v* is smoothed at a resolution 32 × 32. Wave dynamics, using this coarse-scale current, are obtained Fig. 2. The branched regime is strongly weakened, i.e. the spectral small-scale turbulence diffusion is missing.

A stochastic current is then added to this coarse deterministic one. That stochastic component is divergence-free and has a self-similar distribution of energy across spatial scales. Its precise parametrisation is a modification of the ADSD calibration (Resseguier et al., 2020) (see Sect. 3.2). Figure 3 displays the wave simulations. This white-in-time model appears to work for a sufficiently well-resolved largescale current. Indeed, the decorrelation ratio <sup>=</sup> *(lv/v*<sup>0</sup> *<sup>g</sup>)***∇***v<sup>T</sup>* depends on this resolution through *lv* . Specifically, for this SQG flow, the large-scale current *v* needs to be resolved at least on a 32 × 32 grid, i.e. with a resolution *lv* = 31*.*3 km. As such, we obtain <sup>=</sup> <sup>3</sup>*.*<sup>23</sup> <sup>×</sup> <sup>10</sup>−<sup>2</sup> (computed with 1*/***∇***v<sup>T</sup>* = <sup>1</sup>*.*<sup>38</sup> <sup>×</sup> 105 s and *Cg* 10 m.s−1).

#### **5 Conclusion**

The presence of velocity variations results in random scattering of swell-wave rays. Interactions are weak, but cumulative effects can become significant, to increase the average path length taken by the swell energy to reach an observer. Nowadays, sufficiently precise measurements can then open the possibility to use along-ray measurements to probe the near-surface ocean turbulence. Under a Lagrangian time-decorrelation assumption and using geometrical optics, a practical stochastic framework helps express these scattering effects on the mean swell-action statistics, directly in terms of the KE spectrum of the unresolved surface current field. Results are presented in both Lagrangian and Eulerian forms, where the latter augments the initial radiative transport equation with a diffusive term in directional space. Measured delays in swell arrivals, estimated wave height spectral characteristics and decays, and/or varying directional spread of the swell field shall then be more quantitatively interpreted to infer regional and seasonal upper ocean dynamical properties.

**Acknowledgments** This work is supported by the R&T CNES R-S19/OT-0003-084, the ERC project 856408-STUOD, the European Space Agency World Ocean Current project (ESA Contract No. 4000130730/20/I-NB), and SCALIAN DS.

**Fig. 2** Swell interacting with a low-resolution (32 × 32) deterministic SQG current. The left panel shows ray trajectories computed by forward advection and superimposed on the low-resolution current vorticity *ω* = **∇**<sup>⊥</sup> **·** *v*. The right panel shows bidirectional wave spectra, computed by backward advection, at 8 locations along a meridional axis (the mean wave propagation direction)

**Fig. 3** Swell interacting with a low-resolution (32 × 32) deterministic SQG current plus (one realization of) the time-uncorrelated stochastic model. Ray trajectories are computed by forward advection and superimposed on the low-resolution current vorticity *ω* = **∇**<sup>⊥</sup> **·** *v*

#### **References**

Ardhuin F, Chapron B, Collard F (2009) Observation of swell dissipation across oceans. Geophysical Research Letters 36(6)

Ardhuin F, Gille ST, Menemenlis D, Rocha CB, Rascle N, Chapron B, Gula J, Molemaker J (2017) Small-scale open ocean currents have large effects on wind wave heights. Journal of Geophysical Research: Oceans 122(6):4500–4517


Lapeyre G (2017) Surface quasi-geostrophy. Fluids 2(1):7


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Modified (Hyper-)Viscosity for Coarse-Resolution Ocean Models**

**Louis Thiry, Long Li, and Etienne Mémin**

**Abstract** We present a simple parameterization for coarse-resolution ocean models. To replace computationally expensive high-resolution ocean models, we develop a computationally cheap parameterization for coarse-resolution models based solely on the modification of the viscosity term in advection equations. It is meant to reproduce the mean quantities like pressure, velocity, or vorticity computed from a high-resolution reference solution or using observations. We test this new parameterization on a double-gyre quasi-geostrophic model in the eddy-permitting regime. Our results show that the proposed scheme improves significantly the energy statistics and the intrinsic variability on the coarse mesh. This method shall serve as a deterministic basis model for coarse-resolution stochastic parameterizations in future works.

#### **1 Introduction**

Ocean general circulation models used at climatic scales are limited for evident computational reasons to too coarse horizontal resolutions to solve correctly ocean mesoscale and sub-mesoscale eddies, even with large computational infrastructure. The horizontal resolution of the most recent climatic ocean models is of the order of the Rossby radius of deformation. These models are hence in the so-called eddypermitting regime and they can solve partially the mesoscale (i.e. 10–100 km) eddy field. These models however suffer from strong limitations. In particular, they are unable to reproduce accurately large-scale structures such as the eastward turbulent jet in an idealized double-gyre configuration.

Recent parameterizations have shown significant improvements in coarseresolution models compared to high-resolution reference solutions [2]. However, it remains an important topic of research, as the actual generation of parametrizations

L. Thiry (-) · L. Li · E. Mémin

INRIA/IRMAR, Rennes, France

e-mail: louis.thiry@inria.fr

<sup>©</sup> The Author(s) 2023 B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_17

is not completely able to resolve the effects of the unresolved scales on the largescale flow structures.

A wide range of subgrid parametrizations relies on eddy viscosity such as Laplacian and biharmonic schemes [16, 10, 4, 3]. It has been shown in [9] that including only these (hyper)viscosity in coarse-resolution models often causes too much dissipation and results in an artificial energy sink at large scales. In general, even eddy-permitting models are not energetic enough and as a result, the long-time average of any coarse model's variable of interest departs completely from the longtime average of high-resolution models subsampled at the same scale. This becomes the main motivation of the present work. In particular, we would like to answer the following question: how can we reduce the excessive resolved kinetic energy loss due to the viscosity while simultaneously ensuring numerical stability?

We propose a simple affine parameterization of (hyper)viscosity. The (bi)laplacian operator *'pf* is replaced by *'p f* − *f* , where *f* is a field of same dimension as *f* that does not depend upon time. We interpret this method as a mathematical regularization technique to guide the solutions towards prior information. We frame *f* as the solution of an optimal control problem to reproduce statistics computed from a reference solution or observations. We present a method to solve this optimal control problem.

We test the proposed method with an idealized double-gyre configuration. For that purpose, we release with this article a fast, concise, and CPU-GPU portable Pytorch implementation of a multi-layer quasi-geostrophic model on a rectangular domain. We implement and test our optimization procedure within this setting.

This article is organized as follows: we present in Sect. 2 the double gyre quasigeostrophic model we use and detail its implementation, we present in Sect. 3 our *modified viscosity* parameterization and we show and discuss numerical results in Sect. 4.

#### **2 Double Gyre Quasi-Geostrophic Model**

#### *2.1 Governing Equations*

We use the same multi-layer quasi-geostrophic model in a non-periodic rectangular domain as in [6]. Here, we only give a brief review of this system. The quasigeostrophic pressure and potential vorticity (PV) are stacked in three isopycnal layers. We adopt vector forms to denote the layered pressure and potential vorticity (PV):

$$\mathbf{p} = \begin{bmatrix} p\_1 \\ p\_2 \\ p\_3 \end{bmatrix}, \quad \mathbf{q} = \begin{bmatrix} q\_1 \\ q\_2 \\ q\_3 \end{bmatrix}.$$

The forced and damped quasi-geostrophic (QG) equations can be then written as

$$
\partial\_I \mathbf{q} = \frac{1}{f\_0} J(\mathbf{q}, \mathbf{p}) + f\_0 B \mathbf{e} + \frac{1}{f\_0} (a\_2 \Delta - a\_4 \Delta^2)(\Delta \mathbf{p}), \tag{1}
$$

$$(\Delta - f\_0^2 A)\mathbf{p} = f\_0 \mathbf{q} - f\_0 \beta (\mathbf{y} - \mathbf{y}\_0),\tag{2}$$

where *'* <sup>=</sup> *<sup>∂</sup>*<sup>2</sup> *xx* <sup>+</sup>*∂*<sup>2</sup> *yy* denotes the horizontal Laplacian, *'*<sup>2</sup> the bi-laplacian operator, *J (a, b)* = *∂x a∂y b* − *∂x b∂y a* stands for the Jacobi operator, *f*<sup>0</sup> + *β(y* − *y*0*)* is the Coriolis parameter under beta-plane approximation with the meridional axis center *y*0, *a*<sup>2</sup> and *a*<sup>4</sup> are the Laplacian and biharmonic viscosity coefficients. Parameters of the configuration are listed in the Tables A.1 and A.2 in Appendix. Besides, the second term on the right-hand side of Eq. (1) represents the external forcing applied on different layers. In this work, we only consider an idealized case in which the ocean basin is driven by a stationary and symmetric wind stress *<sup>τ</sup>* <sup>=</sup> *(τ <sup>x</sup> , τ <sup>y</sup> )* on the surface and by a linear Ekman stress at the bottom. In that case, the forcing term can be specified by

$$B = \begin{bmatrix} \frac{1}{H\_1} & \frac{-1}{H\_1} & 0 & 0\\ 0 & \frac{1}{H\_2} & \frac{-1}{H\_2} & 0\\ 0 & 0 & \frac{1}{H\_3} & \frac{-1}{H\_3} \end{bmatrix}, \qquad \mathbf{e} = \begin{bmatrix} \partial\_{\mathbf{x}} \boldsymbol{\tau}^y - \partial\_{\mathbf{y}} \boldsymbol{\tau}^x\\ 0\\ 0\\ \frac{\delta \mathbf{e}\_k}{\mathbb{Z}[f\_0]} \Delta p\_3 \end{bmatrix}, \qquad \boldsymbol{\tau} = \mathbf{r}\_0 \begin{bmatrix} -\cos(2\pi \mathbf{y}/L\_y) \\ 0 \end{bmatrix},$$

where *τ*<sup>0</sup> is the magnitude of surface wind, *Hk* is the background thickness of layer *k*, and *δ*ek is the bottom Ekman layer thickness. The vertical stratification level of such a model is described by the term <sup>−</sup>*<sup>f</sup>* <sup>2</sup> <sup>0</sup> *A***p** in Eq. (2) with

$$A = \begin{bmatrix} \frac{1}{H\_1 \mathcal{S}\_{1,S}'} & \frac{-1}{H\_1 \mathcal{S}\_{1,S}'} & 0\\ \frac{-1}{H\_2 \mathcal{S}\_{1,S}'} & \frac{1}{H\_2} \left(\frac{1}{\mathcal{S}\_{1,S}'} + \frac{1}{\mathcal{S}\_{2,S}'}\right) \frac{-1}{H\_2 \mathcal{S}\_{2,S}'}\\ 0 & \frac{-1}{H\_3 \mathcal{S}\_{2,S}'} & \frac{1}{H\_3 \mathcal{S}\_{2,S}'} \end{bmatrix},$$

where *gk*+0*.*<sup>5</sup> is the reduced gravity defined across the interface between layers *k* and *k* + 1. A multi-layered generalization of this model can be found in [5]. Note also that such a multi-layered model can be considered as a vertical discretized approximation of the continuously stratified QG system [17] with *∂z(f*0*∂z***p***/***N**2*)* <sup>≈</sup> −*f*0*A***p** approximated by finite differences, and in which **N** denotes the buoyancy (or Brunt-Vaisala) frequency.

#### *2.2 Pytorch Implementation*

To facilitate numerical developments and benefit from built-in automatic differentiation, we develop a Pytorch [12] implementation of the above-described multilayer QG model.1 For this purpose, we follow rigorously the strategy of [7]:


Detailed equations and numerical routine design choices can be found in [7]. We use a Heun–Runge–Kutta 2 time-stepping instead of the Leap-Frog time scheme used by [7].

For sake of numerical efficiency, we follow the recommendation of [14]: we compile computationally demanding routines and simplify finite difference calculations by reducing as much as possible the number of multiplications. We end up with a very concise code (less than 300 lines) that only depends upon Numpy and Pytorch libraries. This implementation will be open-sourced at the time of the publication.

#### *2.3 Eddy-Resolving and Eddy-Permitting Regimes*

We consider two spatial settings for our simulations:


Parameters for these two different regimes are written in Table A.2 in Appendix.

Shevchenko and Berloff [15] studied the resulting flows' differences between these two regimes. The high-resolution eddy-resolving model shows a wellpronounced eastward jet fuelled by mesoscale eddies circulating while the low-resolution eddy-permitting model does not induce a proper eastward jet as shown on Fig. 1. Temporal statistics significantly differ between high- and lowresolution simulations.

<sup>1</sup> Available at https://github.com/louity/qgm\_pytorch.

**Fig. 1** (Top) high-resolution and (bottom) low-resolution top-layer snapshots after 400 years of integration starting from zero velocity. Velocities are in m s−1 and PV in s−<sup>1</sup>

#### **3 Proposed Modified Viscosity**

#### *3.1 Motivation*

In both resolutions, we use biharmonic viscosity as in [16, 10, 4, 3] essentially because it is less dissipating at large scales than a Laplacian. Compared to the usual Laplacian viscosity, it preserves large-scale structures. However, hyperviscosity remains much too dissipative in the "eddy-permitting" regime [9]. This too strong dissipation *kills* the eastward jet that is present in the high-resolution and that we expect to see in such a double-gyre quasi-geostrophic model. Figure 2 shows a sequence of snapshots of the low-resolution models where we input a downsampled snapshot of the high-resolution (see Appendix for details on downsampling). After as few as three years, the eastward jet has almost disappeared, showing that the model is too dissipating. Lowering the hyper-viscosity coefficient by a factor of 10 does not solve this problem, and creates spurious gradients in the potential vorticity as shown in Fig. 2. These numerical artifacts are due to a bad representation of the direct enstrophy cascade, causing a piling up of the small-scale vorticity gradients at the cut-off frequency together with aliasing effects.

**Fig. 2** (left) Initial condition: high-resolution snapshot on the low-resolution grid.(center and right) Zonal velocity and potential vorticity (PV) snapshots after 3 years of integration at lowresolution with Eqs. (1, 2) with (top) standard hyper-viscosity and (bottom) 10 times smaller hyper-viscosity. We can see aliasing effects on potential vorticity snapshots integrated with low hyper-viscosity

#### *3.2 Modified Viscosity*

Here we propose a simple affine modification parameterization of hyperviscosity. We add a bias to the term *'***p** in Eq. (1), which becomes *'* **p** − **p** where **p** is a dimensional field that does not depend upon time. The PV advection equation with hyperviscosity becomes

$$
\partial\_l \mathbf{q} = \frac{1}{f\_0} J(\mathbf{q}, \mathbf{p}) + f\_0 B \mathbf{e} + \frac{1}{f\_0} (a\_2 \Delta - a\_4 \Delta^2) \left(\Delta(\mathbf{p} - \mathbf{p}')\right). \tag{3}
$$

The elliptic equation (2) remains unchanged.

The goal of this additional term is to reproduce a relevant time-average pressure field relying on observations or high-resolution solutions. For example the highresolution average **p**HR can be downsampled to the targeted coarse grid resolution in **p**HR ↓, and we want the average of the modified low-resolution **p**LR model to be as close as possible to the high-resolution reference **p**HR ↓.

We face here an optimal control problem, as the low-resolution average is a function of the control parameter **p** . We state it with the following least-square formulation

$$\mathbf{p}'\_{\rm opt} = \underset{\mathbf{p}'}{\mathrm{argmin}} \,\mathcal{F}(\mathbf{p}') \tag{4}$$

$$\mathcal{F}(\mathbf{p'}) = \left\| \begin{array}{c} \overline{\mathbf{p}\_{\text{LR}}} \left( \mathbf{p'} \right) - \overline{\mathbf{p}\_{\text{HR}}} \downarrow \right\|^2 \tag{5} \tag{5}$$

This optimization problem is a priori non-convex and we shall not expect to find a global optimum. In the following, we propose a numerical procedure to find a heuristic **p**ˆ of the optimal solution **p** opt.

Computationally, the implementation of this modified hyperviscosity is simple and computationally cheap. We precompute *'***p** and subtract it from *'***p** at each time-integration step. It increases the integration time of the advection equation (1) by less than 1% on CPUs and GPUs.

#### *3.3 Modified Viscosity Regularization*

The continuously stratified QG equations can be rewritten in a variational formulation [8] with a Hamiltonian J defined as

$$\mathcal{J}(\mathbf{p}) = \frac{1}{2} \int\_{\Omega} \frac{1}{f\_0} |\nabla \mathbf{p}|^2 + \frac{f\_0}{N^2} (\partial\_{\varepsilon} \mathbf{p})^2.$$

Our model is a discretized version of the continuous stratification. Since we add an external wind forcing term and we use an energy conservative Arakawa advection scheme, we need to add some viscosity or hyperviscosity to dissipate energy. In a variational formulation, these (hyper-)viscous terms become the following penalization

$$\frac{1}{2}\int\_{\Omega} a\_2|\Delta \mathbf{p}|^2 + a\_4|\nabla \left(\Delta \mathbf{p}\right)|^2,$$

added to the Hamiltonian J*(***p***)* to produce a smooth solution. The Gradient norm penalization of Laplacian **p** guides the minimization toward solutions of smooth Laplacian. Hyperviscosity corresponds to the Laplacian norm penalization and enforces a solution of minimum Laplacian norm. The parameters *a*<sup>2</sup> and *a*<sup>4</sup> quantify the strength of these regularization constraints.

Here, we simply propose to replace it with the following penalization

$$\frac{1}{2} \int\_{\Omega} a\_2 |\Delta(\mathbf{p} - \mathbf{p}')|^2 + a\_4 \left| \nabla \left( \Delta(\mathbf{p} - \mathbf{p}') \right) \right|^2 \dots$$

We now penalize *(***p** − **p** *)* instead of **p**, meaning that we guide the solution to a possibly non-smooth reference **p** that will produce the correct large scale behavior.

#### *3.4 Iterative Procedure*

Here we present a method to find a solution to the optimization problem (4). A natural guess for **p** opt is **p**HR ↓. We solve the equations and compute the average pressure **p**LR. Results are shown in Fig. 4. It is a good first-guess, but the difference **p**HR ↓ −**p**LR is still large.

We propose the following iterative procedure to find a better guess for **p** opt. In the following we assume that we are in low resolution, i.e. **p** = **p**LR and **p** = **p**LR unless explicitly written.

	- Set **p** *<sup>n</sup>*+<sup>1</sup> <sup>=</sup> **<sup>p</sup>** *<sup>n</sup>* + *k* **p**HR ↓ −**p***<sup>n</sup>* .
	- Evolve the ensemble for *n* years and compute new average pressure **p***n*+1.

There is no theoretical guarantee that this procedure converges, but we observe in the next section that it converges with the double-gyre QG model that we use.

#### **4 Results and Discussion**

#### *4.1 Statistics*

We use ensemble averages to compute the statistics. To create ensembles of size *N*, we start from a zero solution and spin up the models for 100 years with a timestep of 1200 s to reach statistically steady states as in [13]. Then we run the models for 500 years and save 10 snapshots a year to get 5000 snapshots, and we randomly select *N* snapshots out of these 5000 snapshots. The ensemble averages are simply average over these *N* ensemble members that we evolve in parallel. Such ensemble averages are denoted with • in the following, i.e. the average pressure is denoted by **p**, average velocity by *u*, etc.

**Fig. 3** Evolution of the relative square error **p***n*−**p***H R*↓<sup>2</sup> **p***H R*↓<sup>2</sup> w.r.t iterations of the procedure

**Fig. 4** Top-layer average pressure (top) and velocity (bottom) of (left-to-right) proposed model at low-resolution, reference, and the difference between the two

#### *4.2 Iterative Procedure*

We test the iterative procedure described in Sect. 4.2 with the double-gyre model presented in Sect. 2 in the eddy-permitting regime. We use *n* = 10 years to evolve the ensemble after each iterate. We compute the reference pressure average **p**HR with the same model in the eddy-resolving regime.

Figure <sup>3</sup> shows the relative square error **p***n*−**p***H R* ↓ 2*/***p***H R* ↓ <sup>2</sup> at iterations of the procedure with *k* = 1 and *k* = 0*.*7. The procedure converges with *k* = 0*.*7 and oscillates with *k* = 1.

Figure 4 shows the output average pressure **p***<sup>n</sup>* of the iterative procedure, the reference **p**HR and the difference between the two, as well as for zonal velocity **u** . Our model can reproduce the eastward jet produced by the high-resolution reference

**Fig. 5** Top-layer kinetic-energy spectra average with models at high-resolution (HR), at lowresolution (LR) and at low-resolution with proposed modified viscosity. The decreasing slope of the spectrum of the proposed model is much closer to the high-resolution reference

**Fig. 6** PV and zonal velocity snapshots form (left-to-right) high-resolution, low-resolution and proposed model at low-resolution

model. Kinetic energy spectra shown on Fig. 5 shows also the improvement of our model compared to low-resolution. Finally, Fig. 6 shows high-resolution and lowresolution snapshots as well as a snapshot of the proposed model at low-resolution. Our model effectively produces the eastward jet and a re-circulation zone around it where eddies are created. Artifacts can be also observed on the zonal velocity and potential vorticity on the right of Fig. 6. They can likely be Rossby waves created by the harmonic regularization terms, which remain an artificial constraint, but this needs to be studied further.

#### **5 Conclusion**

We presented a simple modified-viscosity scheme for coarse resolution ocean modeling that we derived and tested on a double-gyre multi-layer quasi-geostrophic model. We interpret it as a modified regularization technique that will guide the solution to a reference rather than producing a too smooth solution in the eddypermitting regime. The technique requires solving an optimization problem, and we presented a procedure to find a good guess for the solutions. We showed that it converges to a reasonable solution that fairly reproduces the input reference.

If this method mimics the average of the high-resolution, it only reproduces partially the variability and higher-order statistics of the high-resolution. We see in Fig. 5 our model's snapshots resemble the averages. In future works, we consider using this method as a deterministic basis for stochastic parameterizations such as Location-Uncertainty [11].

#### **Appendix**

#### *Downsampling Procedure*

Downsampling the high-resolution solution on a low-resolution grid consists of interpolating the high-resolution (769 × 961) streamfunction on the low-resolution (97 × 121) grid. Then we can compute the potential vorticity using Eq. (2). Because of the no-flow constraint, the downsampled streamfunction should be constant on the boundaries and should satisfy a mass conservation constraint [7]. We also want to preserve the frequency information and prevent aliasing.

We use the following procedure:


#### *Parameter Tables*


**Table A.1** Common parameters for all the models

**Table A.2** Grid-dependent parameters


#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Primitive Equations Under Location Uncertainty: Analytical Description and Model Development**

**Francesco L. Tucciarone, Etienne Mémin, and Long Li**

**Abstract** Resolving numerically all the scale interactions of ocean dynamics in a high resolution realistic configuration is today far beyond reach, and only large scale representations can be afforded. In this work, we study a stochastic parameterization of the ocean primitive equations derived within the modelling under location uncertainty framework. First numerical assessments built with the NEMO core's code are provided for a double-gyres configuration.

**Keywords** Stochastic parametrization · Ocean modelling

#### **1 Introduction**

The Ocean covers a major part of Earth's surface and has an important stabilizing effect on the climate. For climatic prediction, accurate likely ensemble forecasts of future ocean states are consequently essential. However, due to an evident computational limitation high resolution simulations are completely unfeasible and only large-scale ocean representations can be handled. To face this difficulty, and the need of generating different likely future scenarios, there has been a growing interest in the geophysical sciences to set up flow models that incorporate in their dynamics noise terms related to uncertainties or errors. In accounting for the actions of unresolved processes in a random way, these stochastic models are in general less diffusive than the classical large-scale deterministic models. The unresolved processes include small-scale turbulence effects, boundary value uncertainties or uncertainties coming either from scale coarsening or from the numerical schemes used. Moreover, compared to classical large-scale deterministic modelling, the additional degree of freedom brought by the stochastic component allows us to devise new intermediate models [4, 3, 6, 7, 8]. The addition of noise in fluid

F. L. Tucciarone (-) · E. Mémin · L. Li

INRIA Rennes Bretagne Atlantique, IRMAR – UMR CNRS 6625, Rennes, France e-mail: francesco.tucciarone@inria.fr; etienne.memin@inria.fr; long.li@inria.fr

B. Chapron et al. (eds.), *Stochastic Transport in Upper Ocean Dynamics*, Mathematics of Planet Earth 10, https://doi.org/10.1007/978-3-031-18988-3\_18

dynamics models cannot be done in a haphazard manner. Ad-hoc choices for model noise can fundamentally perturb the corresponding fluid dynamics models, making them exhibit unrealistic properties [3]. Rigorously justified methodologies for choosing the model noise have recently been introduced by Mémin [1] and Holm [2]. These derivations lead to large classes of stochastic geophysical fluid dynamics models that preserve either energy or circulation, respectively. Such models naturally emerge from a decomposition of the flow velocity field in terms of a smooth component and a time uncorrelated uncertainty random term. This decomposition is reminiscent, in spirit, of the classical Reynolds decomposition, and enables the definition of large-scale representation with a stochastic term representing smallscale effects. The Location Uncertainty (LU) formulation has been found to be more accurate in structuring the large-scale flow [4] and in reproducing long-terms statistics [22] for the barotropic quasi-geostrophic model. It also provides a good trade-off between model error representation and ensemble spread [21, 23] for the rotating shallow water model and the surface quasi-geostrophic model. In this work we explore more specifically a stochastic version of the primitive equations, named primitive equations under Location Uncertainty. The derivation of this model is detailed and first numerical experiments built from the NEMO code are assessed.

#### **2 Location Uncertainty (LU)**

In the LU formalism, the Lagrangian displacement **X***<sup>t</sup>* associated to a fluid particle is decomposed as:

$$\mathbf{X}\_{l}\left(\mathbf{x}\right) = \mathbf{X}\_{l\_{0}}\left(\mathbf{x}\right) + \int\_{0}^{l} \mathbf{v}\left(\mathbf{X}\_{l}\left(\mathbf{x}\right), \mathbf{s}\right) \mathrm{d}s + \int\_{0}^{l} \boldsymbol{\sigma}\left(\mathbf{X}\_{l}\left(\mathbf{x}\right), \mathbf{s}\right) \mathrm{d}\mathbf{B}\_{l},\tag{1}$$

where **X**: *Ω*×IR<sup>+</sup> → *Ω* is the fluid flow map, that is the trajectory followed by fluid particles starting at initial map **<sup>X</sup>**|*t*=<sup>0</sup> *(***x***)* <sup>=</sup> **<sup>x</sup>**<sup>0</sup> of the bounded domain *<sup>Ω</sup>* <sup>⊂</sup> IR3. Written in differential form Eq. (1) takes the usual form:

$$\mathbf{dX}\_{l}\left(\mathbf{x}\_{0}\right) = \mathbf{v}\left(\mathbf{X}\_{l},t\right)\,\mathrm{d}t + \sigma\left(\mathbf{X}\_{l},t\right)\mathrm{d}\mathbf{B}\_{l}.\tag{2}$$

The first component, **v** *(***X***t, t)*, represents the smooth, resolved velocity field of the flow. It corresponds to the integration of the equations of motions, solved on a grid of a given resolution, and it is supposed to be both spatially and temporally correlated. The second term, *σ (***X***t, t)* d**B***<sup>t</sup>* , is a stochastic process that assembles the unresolved flow component, uncertainties on the flow and turbulent effects. This stochastic contribution, often referred to as *noise* in the following, is built from the application of an Hilbert-Schmidt kernel integral operator, *σ*, to an I3−cylindrical Wiener process **B**

Primitive Equations Under Location Uncertainty 289

$$\left(\boldsymbol{\sigma}\left(\mathbf{X}\_{l},t\right)\,\mathrm{d}\mathbf{B}\_{l}\right)^{l} = \int\_{\varOmega} \check{\sigma}\_{lk}\left(\mathbf{X}\_{l},\mathbf{y},t\right)\,\mathrm{d}\mathbf{B}\_{l}^{k}\left(\mathbf{y}\right)\,\mathrm{d}\mathbf{y},\tag{3}$$

where **B** is defined on a filtered probability space {*Ω,* F*,* P*, (*F*t)t*} and *(*F*t)t* is the filtration adapted to **B**. The application of the (integrable) kernel *σ*˘ imposes fast/small scales spatial correlation and defines a centered Gaussian process *σ*d**B***<sup>t</sup>* ∼ N *(*0*,* **Q**d*t)*, with covariance tensor defined as

$$\mathcal{Q}\_{lj}(\mathbf{x}, \mathbf{y}, t, s) = \mathbb{E}\left[ \left( \boldsymbol{\sigma} \left( \mathbf{x}, t \right) \mathbf{dB}\_{l} \right)^{j} \left( \boldsymbol{\sigma} \left( \mathbf{y}, s \right) \mathbf{dB}\_{s} \right)^{j} \right]$$

$$= \delta \left( t - s \right) \mathrm{d}t \int\_{\varOmega} \check{\sigma}\_{lk} \left( \mathbf{x}, \mathbf{z}, t \right) \check{\sigma}\_{kj} \left( \mathbf{z}, \mathbf{y}, s \right) \, \mathrm{d}\mathbf{z}.$$

The strength of the noise is measured by the diagonal components of the covariance tensor per unit of time, i.e. the variance tensor, **a**, defined as **a***(***x***, t)δ(t* − *t )*d*t* = **Q***(***x***,* **x***,t,t )*. The variance tensor is symmetric and positive definite at any point **x** of the domain. Notably, it has the dimension of a viscosity in m2s−1. The covariance operator is self-adjoint, positive definite and compact and admits a convenient spectral decomposition.

In this paper, the noise will always be assumed to be centred, but it can be proven through Girsanov theorem that one can redefine the Lagrangian displacement (2) as

$$\mathbf{d}\mathbf{X}\_{l}\left(\mathbf{x}\_{0}\right) = \left[\mathbf{v}\left(\mathbf{X}\_{l},t\right) - \boldsymbol{\mu}\_{l}\left(\mathbf{X}\_{l}\right)\right]\mathbf{d}t + \sigma\,\mathbf{d}\dot{\mathbf{B}}\_{l}\left(\mathbf{X}\_{l}\right),\tag{4}$$

where the Wiener process '**B***<sup>t</sup>* is a centred process under a new probability measure Q drifted by **μ***<sup>t</sup>* . Indeed a non centred Wiener process shifted by a random process *(***Y***t)t* can be defined as:

$$
\widetilde{\mathbf{B}}\_l = \mathbf{B}\_l + \int\_0^l \mathbf{Y}\_s \, \mathrm{d}s.\tag{5}
$$

Under good properties of *(***Y***)t* ( <sup>F</sup>*t*-measurability, almost sure *<sup>L</sup>*2−integrability and Novikov condition) there exists a measure Q such that *(*'**B***t)t* is a Q<sup>−</sup> Wiener process With the non centred random process '**B***<sup>t</sup>* we can rewrite the equations with respect to '**B***<sup>t</sup>* as

$$
\sigma \operatorname{d} \mathbf{B}\_l \left( \mathbf{X}\_l \right) = \sigma \operatorname{d} \tilde{\mathbf{B}}\_l \left( \mathbf{X}\_l \right) - \sigma \left( \mathbf{X}\_l, t \right) \mathbf{Y}\_l \operatorname{d} t. \tag{6}
$$

Denoting *σ (***X***t, t)* **Y***<sup>t</sup>* as **μ***<sup>t</sup>* one can write the Lagrangian displacement (2) as (4) and under Q the Wiener process d'**B***<sup>t</sup>* is centred thus the writing of d**X***<sup>t</sup>* has the same form as (2) but under a new measure. All the arguments provided in the following will hold for this process under Q. The use of a drifted noise *σ*d'**B***<sup>t</sup>* is fundamental when the processes employed to operationally define the noise are not centred, hence displaying a non-zero time average.

#### **3 Stochastic Transport Theorem**

The derivation of Eulerian flow dynamics models within the LU formalism relies on a stochastic version of the Reynolds transport theorem (SRTT), introduced in [1], which describes the rate of change of a random scalar *q* transported by the stochastic flow (2) within a flow volume *Vt* :

$$\operatorname{d} \int\_{V\_{l}} q \left( \mathbf{x}, t \right) \, \mathrm{d} \mathbf{x} = \int\_{V\_{l}} \left\{ \mathbf{D}\_{l} q + q \nabla \cdot \left[ \mathbf{v}^{\star} \, \mathrm{d}t + \sigma \, \mathrm{d} \mathbf{B}\_{l} \right] \right\} (\mathbf{x}, t) \, \mathrm{d} \mathbf{x}, \tag{7}$$

with the operator

$$\mathbf{D}\_{l}q = \mathbf{d}\_{l}q + \left[\mathbf{v}^{\star}\,\mathrm{d}t + \sigma\,\mathrm{d}\mathbf{B}\_{l}\right] \cdot \nabla q - \frac{1}{2}\nabla \cdot (\mathbf{a}\nabla q) \,\mathrm{d}t,\tag{8}$$

defining the stochastic transport operator. The SRTT is in perfect analogy with the deterministic Reynolds transport theorem (compare with [13] section 5.3), and the various terms can be interpreted physically. Proceeding in order, the first right-hand side term of (8) is the *increment in time* at a fixed location of the process *q*, that is d*tq* = *q (***X***t, t* + d*t)* − *q (***X***t, t)*. This contribution plays the role of the partial time derivative for a process that is not time differentiable. The term enclosed in the square brackets is a *stochastic advection displacement*. It involves a time correlated modified advection,

$$\mathbf{v}^\star = \mathbf{v} - \frac{1}{2}\nabla \cdot \mathbf{a} + \sigma^\mathrm{T} \left(\nabla \cdot \boldsymbol{\sigma}\right),\tag{9}$$

and a fast evolving, time uncorrelated noise *σ*d**B***<sup>t</sup>* . The advection by this term of variable *q* leads to a *multiplicative noise*, which is hence non Gaussian. This type of noise is often denoted as *transport noise* in the literature. The second term of the modified advection is coined as the *Ito-Stokes drift* velocity in [4], **<sup>v</sup>***<sup>s</sup>* <sup>=</sup> <sup>1</sup> <sup>2</sup>∇**·a**. It represents an effective transport velocity resulting from statistical effects due to inhomogeneities of the noise term. The last term of the transport operator is a dissipation term that depicts the mixing mechanism due to the unresolved scales. Following [5] one can consider the transport of a characteristic function to introduce an evolution equation for the Jacobian determinant *J* of the flow:

$$J\mathbf{D}\_l J - J\nabla \cdot \left[ \left( \mathbf{v} - \mathbf{v}^s + \sigma^\Gamma \left( \nabla \cdot \boldsymbol{\sigma} \right) \right) \mathbf{d} + \sigma \mathbf{d} \mathbf{B}\_l \right] = 0. \tag{10}$$

This equation provides a clear condition for the stochastic flow to be isochoric:

$$\nabla \cdot \left[ \mathbf{v}^\star \,\mathrm{d}t + \sigma \,\mathrm{d}\mathbf{B}\_l \right] = 0. \tag{11}$$

#### **4 Boussinesq Equations**

Under location uncertainty, a stratified ocean can be modelled with a modified version of Boussinesq equations. The derivation that is outlined here follows almost verbatim the asymptotic derivation given in [12]. First, one applies the SRTT (7) to the density and imposes conservation, that is d ! *Vt ρ (***x***, t)* d**x** = 0. Then, assuming that the fluctuations of density are small compared to the mean,

$$
\rho\left(\mathbf{x},t\right) = \rho\_0 \left[1 + \varepsilon \left\|\left\|\left(t,\mathbf{x}\right)\right\|\right],\tag{12}
$$

and using *ε* as an asymptotic ordering parameter to perform an expansion of the conservation of mass, the first order is found to be:

$$\nabla \cdot \left[ \mathbf{v}^\star \,\mathrm{d}t + \sigma \,\mathrm{d}\mathbf{B}\_l \right] = 0,\tag{13}$$

that can be split in two incompressibility conditions involving both the modified drift velocity **v***\** and the fast scale component *σ*d**B***<sup>t</sup>* thanks to the uniqueness of semimartingale decomposition [15]. Applying again the SRTT (7) to the momentum reads

$$
\rho \mathbf{D}\_l \mathbf{v} = -\nabla \left( p - \frac{\mu}{3} \nabla \cdot \mathbf{v} \right) \, \mathrm{d}t - \nabla \left( \mathrm{d}p\_l^\sigma \right) - \rho g \mathbf{e}\_3 \, \mathrm{d}t,\tag{14}
$$

where the right hand side entails pressure forces, compressibility effects [14] and gravitational forces. The compressibility term <sup>μ</sup> <sup>3</sup> ∇**·v**, with μ dynamical viscosity of water, is usually neglected in the deterministic derivation of the Boussinesq model, but in this model is maintained in view of the different incompressibility condition (12), that enforces ∇**·v** = ∇**·v***s*. Following classical nondimensionalization procedure [12, 14], characteristic scales are introduced as:

$$\mathbf{x} = L\hat{\mathbf{x}}, \qquad \mathbf{v} = U\hat{\mathbf{v}}, \qquad t = \tau \hat{t}, \qquad p = \frac{\rho\_0 U^2}{\epsilon} \hat{p}, \qquad \mathbf{g} = \frac{U^2}{\epsilon L} \hat{\mathbf{g}}, \tag{15}$$

with *τ* = *L/U* advective time scale. Furthermore, the variance tensor is assumed to scale as **a** = *A***a**ˆ so that the fast-evolving component *σ*d**B***<sup>t</sup>* and the kernel *σ* can be scaled as

$$
\sigma \,\mathrm{d}\mathbf{B}\_{l} = \sqrt{\frac{AL}{U}} \hat{\sigma} \,\mathrm{d}\hat{\mathbf{B}}\_{l} \quad \text{and} \quad \sigma = \sqrt{A} \,\hat{\sigma} \,. \tag{16}
$$

In this novel framework a non-dimensional parameter *Υ* = *UL/A* is introduced to compare advection and stochastic diffusion terms in the momentum equation. This parameter is termed *stochastic Peclet number*, in perfect similarity with the deterministic advection-diffusion problem [10]. Introducing these variables, following [12], one obtains:

$$\rho\_0 \left( 1 + \epsilon \delta \hat{\rho} \right) \cdot \left[ \mathbf{d}\_l \hat{\mathbf{v}} + \left[ \left( \hat{\mathbf{v}} - \frac{1}{\mathcal{T}} \hat{\mathbf{v}}\_s \right) \hat{\mathbf{d}} \hat{\mathbf{f}} + \frac{1}{\mathcal{T}^{1/2}} \hat{\sigma} \, \mathrm{d} \hat{\mathbf{B}}\_l \right] \cdot \hat{\nabla} \hat{\mathbf{v}} $$
 
$$ - \frac{1}{2 \mathcal{T}} \hat{\nabla} \cdot \left( \hat{\mathbf{a}} \hat{\nabla} \hat{\mathbf{v}} \right) \, \mathrm{d} \hat{\mathbf{f}} \right] = \hat{\nabla} \left( - \frac{\rho\_0}{\epsilon} \hat{p} + \frac{1}{\mathrm{Re} \mathcal{T}} \frac{1}{\mathcal{T}} \hat{\nabla} \cdot \hat{\mathbf{v}}\_s \right) \, \mathrm{d} \hat{\mathbf{f}} \quad (17)$$
 
$$ - \hat{\nabla} \left( \frac{P^{\sigma}}{U^2} \mathrm{d} \hat{p}\_l^{\sigma} \right) - \rho\_0 \left( 1 + \epsilon \delta \hat{\rho} \right) \frac{\hat{\mathbf{g}}}{\epsilon} \mathbf{e}\_3 \, \mathrm{d} \hat{\mathbf{f}} .$$

Expanding each variable as an asymptotic with taken as ordering parameter, Eq. (17) provides at lowest order, once dimensional variables are replaced to nondimensional variables,

$$
\nabla p\_0 = -\rho\_0 \mathbf{g} \mathbf{e}\_{\varepsilon}, \quad p\_0 \left( z \right) = -\rho\_0 \mathbf{g} \, z. \tag{18}
$$

Decomposing the density into a background constant density and a deviation, corresponds on the pressure variable to a decomposition in terms of a hydrostatic component and a pressure fluctuation. This splitting,

$$
\rho\left(t, \mathbf{x}\right) = \rho\_0 + \rho'\left(t, \mathbf{x}\right), \qquad p\left(t, \mathbf{x}\right) = p\_0 + p'\left(t, \mathbf{x}\right), \tag{19}
$$

allows the recognition of the first order component of the pressure as the deviation from the hydrostatic pressure *p* , so that Eq. (17) at first order in dimensional form becomes

$$\mathbf{d}\_{l}\mathbf{v} + \left[ (\mathbf{v} - \mathbf{v}^{s}) \ \mathrm{d}t \ + \sigma \mathrm{d}\mathbf{B}\_{l} \right] \cdot \nabla \mathbf{v} - \frac{1}{2} \nabla \cdot (\mathbf{a} \nabla \mathbf{v}) \ \mathrm{d}t = 0$$

$$= \nabla \left( -p' + \frac{\nu}{3} \nabla \cdot \mathbf{v}\_{s} \right) \ \mathrm{d}t - \nabla \left( \frac{dp\_{l}^{\sigma}}{\rho\_{0}} \right) - \frac{\rho'}{\rho\_{0}} \mathrm{g} \mathbf{e}\_{\varepsilon} \ \mathrm{d}t.$$

The splitting (19) also introduces naturally the *buoyancy* **b** = −*g***e**3*ρ (t,* **x***) /ρ*<sup>0</sup> in the equations of motions, representing the upward (or downward) force associated with the density anomaly *ρ* . In terms of buoyancy, the momentum equation can be written as

$$\mathbf{D}\_{l}\mathbf{v} = \nabla \left( -p' - \frac{dp\_{l}^{\sigma}}{\rho\_{0}} + \frac{\nu}{3} \nabla \cdot \mathbf{v}\_{s} \right) \mathbf{d}t - \mathbf{b} \,\mathrm{d}t. \tag{20}$$

A stochastic transport equation can be written for the buoyancy from mass conservation. However, in this work a tracer transport equation on salinity, *S*, and temperature, *T* , is preferred, relating then the buoyancy and the tracers with a buoyancy state equation *b* = *b (T,S,z)*. The conservation of a given tracer *θ* is expressed as

$$\text{D}\_{l}\theta + \theta \nabla \cdot [(\mathbf{v} - \mathbf{v}\_{s}) \text{ dt} + \sigma \text{dB}\_{l}] = F^{\theta} \text{ dt} + D^{\theta} \text{ dt},\tag{21}$$

where the variation of tracer quantity is balanced by a forcing term *F<sup>θ</sup>* and a diffusive term *D<sup>θ</sup>* . We note that here these terms are assumed to be regular in time, although additional Brownian terms could be considered to encode intermittent forcing. The resulting system, split into horizontal and vertical equations using the convention **v** = *(***u***, w)*, is:

Horizontal momentum:

$$\mathbf{D}\_{\mathbf{l}}\mathbf{u} + f\mathbf{e}\_{\mathcal{Y}} \times \left(\mathbf{u}\,\mathrm{d}t + \frac{1}{2}\sigma\,\mathrm{d}\mathbf{B}\_{\mathbf{l}}^{\mathrm{H}}\right) = \nabla\_{\mathrm{H}}\left(-p' + \frac{\nu}{3}\nabla\cdot\mathbf{v}\right)\,\mathrm{d}t - \nabla\_{\mathrm{H}}\mathrm{d}p\_{\mathbf{l}}^{\sigma} \quad (22)$$

Vertical momentum:

$$\mathbf{D}\_{l}w = \frac{\partial}{\partial z}\left(-p' + \frac{\nu}{3}\nabla \cdot \mathbf{v}\right)\,\mathrm{d}t - \frac{\partial}{\partial z}\mathrm{d}p\_{l}^{\sigma} + b\,\mathrm{d}t \tag{23}$$

Temperature and salinity:

$$\mathbf{D}\_l T = \kappa\_T \Delta T \,\mathrm{d}t,\tag{24}$$

$$\mathbf{D}\_l \mathbf{S} = \kappa\_S \Delta S \,\mathrm{d}t,\tag{25}$$

Incompressibility:

$$
\nabla \cdot \left[ \mathbf{v} - \mathbf{v}^s \right] = 0, \qquad \nabla \cdot \sigma \, \mathbf{dB}\_l = 0,\tag{26}
$$

Equation of state:

$$b = b \left( T, S, z \right). \tag{27}$$

Temperature and salinity are introduced as active tracers, as they modify the buoyancy field, and their stochastic evolution is obtained again by application of the SRTT (7), balanced with a diffusion process with diffusivity *κT* and *κS* respectively. The unusual coefficient 1*/*2 in the random Coriolis term can be shown to appear naturally from a derivation of the non-inertial acceleration in this stochastic framework, again following the derivation of [12]. Metric terms relative to the rotation of the earth should also be adapted to the stochastic Frenet-Serret formula d**C** = *Ω*d*t* × **C** in the case of planetary scale simulations. In Eqs. (22) and (23) the *stochastic pressure* is introduced, and corresponds to a zero-mean turbulent pressure related to the small scale velocity component (i.e. noise). It is a martingale term. An operational model referred to as the primitive equations can be obtained through the so-called hydrostatic balance, resulting from neglecting the vertical acceleration terms through a proper scaling of the velocity. In our stochastic setting, the vertical momentum equation reads, after neglecting the large scale acceleration terms and for moderate noise ( *Υ* ∼ O *(*1*)* so as the martingale terms related to the vertical velocity component are negligible):

$$-\frac{\partial p'}{\partial z} + b = 0 \quad \text{and} \quad \frac{\partial \mathbf{d}p\_t^{\sigma}}{\partial z} = 0,\tag{28}$$

where the bounded variation terms and the martingale terms have been safely separated. The left equation constitutes the usual hydrostatic balance. With the scaling used, the stochastic pressure is constant along depth and is in balance with the stochastic Coriolis component [9, 5]. These two martingale terms can be removed then from the horizontal momentum equation. In this setting the vertical component of the momentum equation becomes a diagnostic component that can be recovered integrating the continuity equation given by (26). In a similar way, the large scale pressure is obtained from the vertical integration of the hydrostatic relation. The scaling parameter *Υ* can also be related to the ratio between the Mean Kinetic Energy (TKE) when an advective time scale is used, that is

$$\mathcal{T} = \frac{U^2}{A/\mathfrak{r}} = \frac{1}{\epsilon} \frac{MKE}{TKE} \tag{29}$$

where = *τσ /τ* , is the ratio of the fast-scale to the slow-scale correlation times. This ratio can be adapted to the different variables involved (i.e. momentum, temperature or salinity) with a value similar to the inverse of the Schmidt number (ratio of diffusion rates) making hence the noise scaling parameter, *Υ* , dependant on the variable transported. The parameter *Υ* appears in dimensional analysis and asymptotic expansions, but plays also a paramount role in the quantification of the strength of the noise.

#### **5 Methods**

The experiments are performed with the level-coordinate free-surface primitive equation ocean model NEMO [16]. The domain configuration is a double-gyre configuration consisting of a 45◦ rotated beta plane centred at ∼ 30◦N, 3180 km long, 2120 km wide and 4 km deep. The domain is bounded by vertical walls and a flat bottom. The seasonally varying wind and buoyancy forcings induce a strong jet to appear diagonally in the domain, separating a warm sub-tropical gyre from a cold sub-polar gyre. Three experiments were performed: two purely deterministic simulations at different resolutions, 1/27◦ (R27d) and 1/3◦ (R3d), and one stochastic simulation at 1/3◦ (R3LU). Each simulation was run for 10 years with data collected every (and averaged over) 5 days. The focus of this paper is to assess the benefits brought by LU to the coarse simulation, so the parameters of the simulation were chosen following thoroughly [17, 18] (see Table 1 for an overview of their values). In this first study, we restrict ourselves to 3D divergence-free horizontal noise (i.e. with no vertical component). In spectral form the random field and the variance tensor can be written as:

$$\sigma \mathbf{d} \mathbf{B}\_l = \sum\_{i \in \mathbb{N}} \lambda\_l^{1/2} \boldsymbol{\varphi}\_l(\mathbf{x}) \mathbf{d} \boldsymbol{\beta}\_l^l, \qquad \mathbf{a} = \sum\_{i \in \mathbb{N}} \lambda\_i \boldsymbol{\varphi}\_l(\mathbf{x}) \boldsymbol{\varphi}\_l^T(\mathbf{x}), \tag{30}$$


**Table 1** Parameters of the model experiments

where {*ϕi(***x***), i* <sup>∈</sup> <sup>N</sup>} are the orthonormal eigenfunctions of the covariance operator associated to {*λi, i* <sup>∈</sup> <sup>N</sup>}, the (real, positive) eigenvalues ranged in decreasing value order and {*β<sup>i</sup> <sup>t</sup> , i* <sup>∈</sup> <sup>N</sup>} is a set of standard (scalar) Brownian variables. This representation corresponds to the Karhunen-Loeve decomposition [24]. Operationally, the (finite) set of eigenfunctions {*φi(***x***), i* ∈ [1*, N*]} and of eigenvalues {*λi, i* ∈ [1*, N*]} are computed through a proper orthogonal decomposition (POD) [11] of the temporal fluctuations of the two-dimensional low resolution residual **u**LR . This velocity residual is obtained through Gaussian filtering of the high resolution deterministic simulation R27d, **u**LR = *(*1 − G*)* **u**HR , with the fluctuations computed through Reynolds decomposition:

$$\mathbf{u}'\_{\text{LR}}\left(\mathbf{x},t\right) = \mathbf{u}\_{\text{LR}}\left(\mathbf{x},t\right) - \overline{\mathbf{u}\_{\text{LR}}\left(\mathbf{x},t\right)}^{\prime} = \sum\_{l=1}^{N} \boldsymbol{\phi}\_{l}\left(\mathbf{x}\right) \boldsymbol{\alpha}\_{l}\left(t\right). \tag{31}$$

The POD procedure applied to **u** LR *(***x***, t)* provides a set {*φi(***x***), i* ∈ [1*, N*]} of eigenfunctions that are stationary in time and such that

$$
\langle \phi\_m, \phi\_n \rangle = \int\_{\varOmega} \phi\_m^\top \phi\_n \, (\mathbf{x}) \, \mathbf{dx} = \delta\_{mn}, \quad \overline{\alpha\_m \alpha\_n}^\mathbf{r} = \lambda\_m \delta\_{m,n}. \tag{32}
$$

The eigenfunctions are used to define the random field and a stationary variance tensor as

$$\sigma \mathbf{dB}\_l(\mathbf{x}) = \sum\_{l=1}^{M(z)} \lambda\_l^{1/2} \boldsymbol{\phi}\_l(\mathbf{x}) \sqrt{\Delta t} \, \mathrm{d}\boldsymbol{\beta}\_l^l, \qquad \mathbf{a}(\mathbf{x}) = \sum\_{l=1}^{M(z)} \lambda\_l \Delta t \, \boldsymbol{\phi}\_l(\mathbf{x}) \boldsymbol{\phi}\_l^T(\mathbf{x}) \tag{33}$$

where *ϕ<sup>i</sup>* = *φ<sup>i</sup>* <sup>√</sup>*Δt* and *M(z)* \$ *<sup>N</sup>* chosen to provide at least 85% of the energy of the fluid layer. Due to the constraint posed by Eq. (26) on the noise, incompressibility on the horizontal noise is imposed by applying a Helmoltz-Hodge decomposition [19] on the each snapshot of the horizontal velocity **u**LR . Moreover, the set of eigenfunctions {*φi(***x***), i* ∈ [1*, N*]} is used to construct the drift **μ***<sup>t</sup>* of Eq. (4) in such a way that the distance between **μ***<sup>t</sup>* and **u**LR *<sup>t</sup>* is minimized, that is

$$\mathfrak{u}\_{l} = \sum\_{l=1}^{N} \phi\_{l}(\mathbf{x}) \,\mathrm{y}\_{l}^{l} \quad \text{with} \quad \mathbf{y}\_{l}^{l} = \arg\min \left\| \overline{\mathbf{u}\_{\mathrm{LR}}(\mathbf{x},t)}^{l}^{l} - \sum\_{l=1}^{N} \phi\_{l}(\mathbf{x}) \,\mathrm{y}\_{l}^{l} \right\|\_{2}. \tag{34}$$

Due to the orthogonality of the basis functions the coefficients can be easily recovered as the orthogonal projection *y<sup>i</sup> <sup>t</sup>* = **u**LR *(***x***, t) t , φi(***x***)*.

#### **6 Results**

In this work we focus on the results of a single realisation. From a qualitative point of view, the effect of the coarsening of the resolution can be seen in Figs. 1 and 2, where the leftmost panel represents the result the R27d simulation, the central panel shows the results of the R3d simulation and the rightmost panel shows the R3LU simulation. The first noticeable characteristic of the R27d reference simulation is the presence of a primary jet stream inclined at an almost −45◦ angle starting at the bottom-left corner and directed towards the centre, and a secondary, smaller jet with the same inclination roughly 80 km above the primary. The presence of both structures is visible in the reference papers [17, 18]. In both figures the comparison between the high resolution and the low resolution deterministic simulation shows a degradation of the information about the jet-streams. Figure 1, depicting the relative vorticity *<sup>ζ</sup>* 10Y <sup>=</sup> *∂x v* − *∂yu /f* 10Y , shows that the deterministic R3d simulation is incapable of reproducing the primary jet characteristic and its positioning, though showing an increased activity in place of the secondary jet stream. The stochastic R3LU simulation presents instead a intensification of the vortical activity in the

**Fig. 1** 10-years averaged relative vorticity *<sup>ζ</sup>* <sup>=</sup> *∂x v* − *∂yu /f* at the surface layer of the model for deterministic high-resolution (1/27◦, left), for deterministic low resolution (1/3◦, middle) and for stochastic low resolution (1/3◦, right)

**Fig. 2** 5-days averaged sea surface height of the model for deterministic high-resolution (1/27◦, left), for deterministic low resolution (1/3◦, middle) and for stochastic low resolution (1/3◦, right)

**Fig. 3** Left and centre panels, standard deviation of the kinetic energy. The color scale has been adjusted to enhance the differences in the jet region, not considering the highly energetic boundaries where peaks present values as 0*.*2 m2*/*s2 for R3d and 0*.*17 m2*/*s2 for R3LU. Right panel, the Gaussian relative entropy for relative vorticity, *ζ* , (cold palette) and kinetic energy (warm palette). The lighter colors represent the deterministic simulation R3d, the darker colors represent the stochastic simulation R3LU. All the statistics are computed over 10 years

regions of the primary and secondary jet. Considering sea surface height, Fig. 2 shows that the best result is obtained by the stochastic simulation that, while not being able to distinguish the primary jet stream by the smaller vortices of the secondary jet, it is capable of reproducing the main behaviour. The left and centre panels of Fig. 3 shows the difference obtained in terms of variance of the kinetic energy in the two coarse simulations, with greater variability obtained with the stochastic model, especially in the area of the jet stream, where a lesser variability is

**Fig. 4** Vertical profile of temperature after 1 year of simulation (left) and after 10 years (right)

shown in the deterministic case. From a quantitative point of view, the simulations are compared using the Gaussian Relative Entropy described in details in [20] and which measures with a single criterion both the mean and variance reconstructions. In the left panel of Fig. 3, values of the GRE for two variables, the relative vorticity *<sup>ζ</sup>* , and the kinetic energy KE <sup>=</sup> *<sup>u</sup>*<sup>2</sup> <sup>+</sup> *<sup>v</sup>*<sup>2</sup> */*2 are compared. For two different depths and in a vertical average sense (*GRE<sup>z</sup>* ), the relative entropy is smaller for the stochastic simulation, indicating a smaller distance from the distribution given by the reference R27d simulation. The proposed stochastic model thus outperforms the standard deterministic simulation in terms of both relative entropy and intrinsic variability for kinetic energy and vorticity. This behaviour is observed in every layer. In the tracers equation the noise has been scaled with the aid of the Schmidt number, the ratio between the eddy viscosity and eddy diffusivity. This consideration stems from the fact that the correlation times for transport of momentum and of tracers are not the same, and the difference can be expressed in terms of the Schmidt number. Figure 4 shows the vertical profiles of horizontally-averaged temperature, *<sup>T</sup> x,y (z, t)* <sup>=</sup> ! *<sup>A</sup> T (x, y, z, t)* d*x*d*y*, at time *t* = 1Y and *t* = 10Y for the three simulations. The vertically averaged temperature shows an increase in mixing of temperature of the stochastic setting with respect to its deterministic counterparts. This process has been observed to be sensible to the noise amplitude and might be caused by the structure of the noise and by the effects of Helmholtz-Hodge decomposition. Further studies to investigate this process with three-dimensional and isopycnal noise are ongoing.

#### **7 Conclusions**

The considered stochastic model has been implemented into the NEMO dynamical core. A 3D horizontal, incompressible noise was considered and has been proven to successfully increase the capabilities of a coarse simulation in simulating the dynamical quantities of interest, when corrected with a stochastic drift leading to a change of probability measure. Both the qualitative behaviour of the jetstream and the quantitative intrinsic variability of the model have been increased. Thermodynamic quantities like temperature and salinity seem to not benefit from this implementation. In future works, more complex non stationary fully 3D noises will be investigated within the same setting.

**Acknowledgments** The authors acknowledge the support of the ERC EU project 856408- STUOD.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Bridging Koopman Operator and Time-Series Auto-Correlation Based Hilbert–Schmidt Operator**

**Yicun Zhen, Bertrand Chapron, and Etienne Mémin**

**Abstract** Given a stationary continuous-time process *f (t)*, the Hilbert–Schmidt operator *Aτ* can be defined for every finite *τ* . Let *λτ,i* be the eigenvalues of *Aτ* with descending order. In this article, a Hilbert space H*<sup>f</sup>* and the (time-shift) continuous one-parameter semigroup of isometries <sup>K</sup>*<sup>s</sup>* are defined. Let {*vi, i* <sup>∈</sup> <sup>N</sup>} be the eigenvectors of <sup>K</sup>*<sup>s</sup>* for all *<sup>s</sup>* <sup>≥</sup> 0. Let *<sup>f</sup>* <sup>=</sup> <sup>∞</sup> *i*=1 *aivi* + *f* <sup>⊥</sup> be the orthogonal decomposition with descending <sup>|</sup>*ai*|. We prove that lim*τ*→∞ *λτ,i* = |*ai*<sup>|</sup> 2. The continuous one-parameter semigroup {K*<sup>s</sup>* : *<sup>s</sup>* <sup>≥</sup> <sup>0</sup>} is equivalent, almost surely, to the classical Koopman one-parameter semigroup defined on *L*2*(X, ν)*, if the dynamical system is ergodic and has invariant measure *ν* on the phase space *X*.

**Keywords** Singular spectrum analysis · Koopman theory · Hilbert–Schmidt theory

#### **1 Introduction**

Let {*f (t)* <sup>∈</sup> <sup>C</sup> : *<sup>t</sup>* <sup>≥</sup> <sup>0</sup>} be a continuous time process. We assume that *<sup>f</sup>* has zero temporal mean and the lagged moments exist for all *s* ≥ 0:

$$\rho(s) := \lim\_{T \to \infty} \frac{1}{T} \int\_0^T f(t)\bar{f}(t+s)dt. \tag{1}$$

Define *<sup>ρ</sup>*−*<sup>s</sup>* = ¯*ρs*. In [3] the self-adjoint operator *Aτ* is defined to act on *<sup>L</sup>*2*(*[0*, τ* ]*)*:

Y. Zhen (-) · B. Chapron

Institut Français de Recherche pour l'Exploitation de la Mer, Plouzané, France e-mail: zhenyicun@protonmail.com

E. Mémin INRIA/IRMAR, Rennes, France

301

$$(A\_{\mathfrak{r}}\mathfrak{g})(t) = \frac{1}{\mathfrak{r}} \int\_0^\mathfrak{r} \mathfrak{g}(\mathfrak{s})\rho(t-s)\mathrm{d}s,\tag{2}$$

for every *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*[0*, τ* ]*)*, and for all *<sup>t</sup>* ∈ [0*, τ* ]. When *<sup>ρ</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> loc*(*R*)* and *ρ(s)*<sup>=</sup> <sup>0</sup> for almost all *s* ∈ [0*, τ* ] , *Aτ* is a Hilbert–Schmidt operator. In particular, *Aτ* is compact and always has a purely punctual spectrum. In other words, the Hilbert space *<sup>L</sup>*2*(*[0*, τ* ]*)* admits a basis {*φi* <sup>∈</sup> *<sup>L</sup>*2*(*[0*, τ* ]*)* : *<sup>i</sup>* <sup>∈</sup> <sup>N</sup>}, so that each *φi* is an eigenvetor of *Aτ* . This implies a Karhunen–Loéve type of decomposition. Namely for any *<sup>h</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*[0*, τ* ]*)*, there exists scalars *ci* <sup>∈</sup> <sup>C</sup>, so that:

$$h(t) = \sum\_{l} c\_{l} \phi\_{l}(t),\tag{3}$$

for any *t* ∈ [0*, τ* ].

As stated in [3], the singular spectrum analysis (SSA) algorithm is based on the spectral analysis of *Aτ* . Given a finite sequence of discrete-time measurements: {*f (nΔt)* : *n* = 0*,* 1*,* 2*,...,N* + *M,* and*(N* + *M)Δt* ≤ *τ* }, the *(N* + 1*)* × *(N* + 1*)* a discretized version of *Aτ* can be approximated by:

$$A\_{\mathbb{T}} \approx C\_N := \frac{1}{M+1} H\_{NM} H\_{NM}^\*,\tag{4}$$

where *HNM* is the trajectory matrix defined by

$$H\_{NM} = \begin{pmatrix} f(0) & f(\Delta t) & \cdots & f(M\Delta t) \\ f(\Delta t) & f(2\Delta t) & \cdots & f((M+1)\Delta t) \\ \vdots & & \\ f(N\Delta t) \ f((N+1)\Delta t) & \cdots & f((N+M)\Delta t) \end{pmatrix},\tag{5}$$

and *H*∗ *NM* refers to the conjugate transpose of *HNM*. Matrix *HNM* can be computed numerically whenever a discrete-time time series is available. Intuitively, for *τ* large enough and *Δt* small enough, *CN* is a good approximation of *Aτ* . The SSA method starts with calculating the spectral quantities (i.e. eigenvectors, eigenvalues) of *CN* . The spectral quantities of *Aτ* are the theoretical quantity that the spectral quantities of *CN* are supposed to represent.

While in practice the SSA method has been applied successfully to a large variety of time series, in a theoretical purpose, yet with practical consequences, one may ask ourselves what is the relation between *Aτ*<sup>1</sup> and *Aτ*<sup>2</sup> for different *τ*<sup>1</sup> and *τ*2? And what is the asymptotic behavior of *Aτ* as *τ* → ∞? In what way is the spectral property of *Aτ* related to intrinsic properties of the dynamical system? These questions are important because for real world data it is often not possible to get finer sampling time *Δt*. However, longer time series are sometimes available with long enough data. In this article we generalize the idea and tools developed in [4] and apply them to study of *Aτ* . We shall prove that

$$\lim\_{\mathbf{r}\to\infty} \lambda\_{\mathbf{r},l} = |a\_l|^2,\tag{6}$$

where *λτ,i* is the *i*-th largest eigenvalue of *Aτ* and *ai* is the *i*-th largest (in modulus) coefficient of some eigenvector *vi* (of unit length) of the time-shift operator <sup>K</sup>*<sup>s</sup>* (for all *s* ≥ 0) in the orthogonal decomposition of *f* :

$$f = \sum\_{l=1}^{\infty} a\_l v\_l + f^\perp,\tag{7}$$

where *f* ⊥ denotes the the expression of *f* in the orthogonal complement of the space spanned by the time-shift operator eigenfunctions. If there are only finitely many *i* (say only *N* terms in the summation) in Eq. (7), then we set *ai* = 0 for *i>N*. The time-shift operator <sup>K</sup>*<sup>s</sup>* is closely related to the classical Koopman operator, which is defined to act, as a time-shift operator, on some function space whose domain is the whole phase space of the dynamical system.

In Sect. 2 we present the main result and a brief introduction of the mathematical tools used by the proof of the main result. All the quantities mentioned above are defined rigorously in Sect. 2. The detailed proof of the main result is presented in Sect. 3.

*Notes and Comments* The main result as well as the techniques and ideas used for the proof are close in spirit to those developed in [4]. However, the Hilbert–Schmidt operator *Aτ* is defined for continuous time process and the theory developed in [4] does not cover the continuous-time case. The objective of this paper is to confirm that the asymptotic behavior of the Hilbert–Schmidt operator *Aτ* is well related to Koopman theory.

#### **2 Preliminaries and the Main Result**

Let {*f (t)* : *t* ≥ 0} be a continuous-time process.

**Assumption 1** *Assume that*

$$\lim\_{T \to \infty} \frac{1}{T} \int\_0^T f(t)dt = 0,\tag{8}$$

*and that ρ(s) is well-defined by Eq.* (1) *for all s* ≥ 0*.*

For any *s* ≥ 0, we use *Fs* to denote the time series {*Fs(t)* = *f (t* + *s)* : *t* ≥ 0}. For any two time series *g* = {*g(t)* : *t* ≥ 0} and *h* = {*h(t)* : *t* ≥ 0}, we define the new time series

$$ag + bh = \{ag(t) + bh(t) : t \ge 0\},\tag{9}$$

where *a, b* <sup>∈</sup> <sup>C</sup>. We consider the following linear space:

$$
\bar{\mathcal{H}}\_f = \text{Span}\_{\mathbb{C}}\{F\_s : s \ge 0\}.\tag{10}
$$

Each element *<sup>h</sup>* <sup>∈</sup> <sup>H</sup>'*<sup>s</sup>* can be written as

$$h = \sum\_{i=1}^{n} c\_i F\_{s\_i},\tag{11}$$

for any *<sup>n</sup>* <sup>≥</sup> <sup>1</sup>*, ci* <sup>∈</sup> <sup>C</sup>*, si* <sup>≥</sup> 0. The existence of *ρ(s)* allows us to define the following positive semi-definite Hermitian form:

$$\langle h, g \rangle = \lim\_{T \to \infty} \frac{1}{T} \int\_0^T h(t)\bar{g}(t)dt. \tag{12}$$

Let *<sup>V</sup>* = {*<sup>v</sup>* <sup>∈</sup> <sup>H</sup>'*<sup>f</sup>* : *v, v*<sup>=</sup> <sup>0</sup>}. Since the Hermitian form is positive semi-definite, *V* is a linear subspace of H˜ *<sup>f</sup>* . And the Hermitian form is strictly positive-definite on the quotient space <sup>H</sup>'*<sup>f</sup> /V* . Hence it defines an inner product on *<sup>H</sup>*'*<sup>f</sup> /V* . We define

$$\mathcal{H}\_f := \overline{\widetilde{\mathcal{H}}\_f/V} \tag{13}$$

where the closure is taken with respect to the inner product defined above.

We define the operator <sup>K</sup>*<sup>s</sup>* on <sup>H</sup>'*<sup>f</sup>* for any *s, s*<sup>1</sup> <sup>≥</sup> 0:

$$
\mathcal{K}^s F\_{s\_1} = F\_{s\_1 + s}. \tag{14}
$$

It is obvious that

$$
\langle \mathcal{K}^s h, \mathcal{K}^s \mathbf{g} \rangle = \langle h, \mathbf{g} \rangle,\tag{15}
$$

for any *h, g* <sup>∈</sup> <sup>H</sup>˜ *<sup>f</sup>* and any *<sup>s</sup>* <sup>≥</sup> 0. Hence <sup>K</sup>*<sup>s</sup>* is well-defined on <sup>H</sup>'*<sup>f</sup> /V* , and can be further extended to the whole H*<sup>f</sup>* by continuity. Therefore we obtain a one parameter family of isometric operators <sup>K</sup>*<sup>s</sup>* that acts on the Hilbert space <sup>H</sup>*<sup>f</sup>* . And obviously we have

$$
\mathcal{K}^{s\_1} \mathcal{K}^{s\_2} = \mathcal{K}^{s\_1 + s\_2}.\tag{16}
$$

To simplify the notation, we use *f* to also denote the continuous-time process *F*0. We further assume that

#### **Assumption 2**

$$\lim\_{s \to 0^+} \|\mathcal{K}^s f - f\|\_{\mathcal{H}\_f} = 0. \tag{17}$$

In other words, Assumption 2 assumes that the curve:

$$\mathcal{N}: [0, \infty) \to \mathcal{H}\_f$$

$$t \to \mathcal{K}^t f \tag{18}$$

is continuous. Since <sup>H</sup>*<sup>f</sup>* is generated by *<sup>f</sup>* and <sup>K</sup>*<sup>s</sup>* are isometries for all *<sup>s</sup>* <sup>≥</sup> 0, Assumption <sup>2</sup> implies that <sup>K</sup>*<sup>s</sup>* <sup>→</sup> *<sup>I</sup>* in the strong operator topology as *<sup>s</sup>* <sup>→</sup> <sup>0</sup>+. In other words, {K*<sup>s</sup>* : *<sup>s</sup>* <sup>≥</sup> <sup>0</sup>} forms a strongly continuous semigroup of isometries on H*<sup>f</sup>* .

Under Assumption 2, we have the following decomposition theorem (see Theorem 9.3 in [2]).

**Theorem 1** *Let* {K*<sup>s</sup>* : *<sup>s</sup>* <sup>≥</sup> <sup>0</sup>} *be a strongly continuous semigroup of isometries on a Hilbert space* H*. Then* H *has the orthogonal decomposition* H = H*<sup>U</sup>* <sup>E</sup>H*NU , where* <sup>H</sup>*<sup>U</sup>* <sup>=</sup> <sup>F</sup> *s*≥0 K*s* <sup>H</sup>*, and* <sup>H</sup>*NU is isomorphic to <sup>L</sup>*2*(*[0*,*∞]*,* <sup>H</sup>0*) for some Hilbert space* <sup>H</sup>0*.* <sup>H</sup>*<sup>U</sup> and* <sup>H</sup>*NU are invariant under* <sup>K</sup>*<sup>s</sup> for all <sup>s</sup>* <sup>≥</sup> <sup>0</sup>*. The operator* <sup>K</sup>*<sup>s</sup>*

*restricted on* <sup>H</sup>*<sup>U</sup> is a strongly continuous semigroup of unitary operators. And* <sup>K</sup>*<sup>s</sup> restricted to* H*NU acts as the unilateral shift operator, i.e. for any γ* ∈ H*NU* = *<sup>L</sup>*2*(*[0*,*∞]*,* <sup>H</sup>0*),*

$$(\mathbb{K}^s \boldsymbol{\nu})(t) = \boldsymbol{\nu}(t + s) \in \mathcal{H}\_0. \tag{19}$$

Theorem 1 provides us with an useful tool to deal with the completely nonunitary component of <sup>K</sup>*s*. For the unitary component, we have the following spectral representation theorem.

**Theorem 2** *Let* {*U (s)* : *s* ≥ 0} *be a strongly continuous semigroup of unitary operators on a Hilbert space* H*. Assume that* H *can be generated by U and some <sup>f</sup>* <sup>∈</sup> <sup>H</sup>*. Then there exists a unitary map <sup>φ</sup>* : <sup>H</sup> <sup>→</sup> *<sup>L</sup>*2*(*R*, dμ) where <sup>μ</sup> is some positive finite measure on* R*, such that*

$$(\phi(f))(\mathbf{x}) = 1,\tag{20}$$

$$(\phi(\mathcal{K}^s g))(\mathbf{x}) = e^{l s \chi} (\phi(\mathbf{g}))(\mathbf{x}) \tag{21}$$

*for all <sup>g</sup>* <sup>∈</sup> <sup>H</sup>*, <sup>x</sup>* <sup>∈</sup> <sup>R</sup>*, and <sup>s</sup>* <sup>≥</sup> <sup>0</sup>*.*

Theorems <sup>1</sup> and <sup>2</sup> suggest the orthogonal decomposition <sup>H</sup>*<sup>f</sup>* <sup>=</sup> <sup>H</sup>*f,U* <sup>E</sup>H*f,NU* <sup>=</sup> *L*2*(*R*,dμf )* <sup>E</sup>*L*2*(*[0*,*∞]*,* <sup>H</sup>*f,*0*)*. Furthermore, we can write *μf* <sup>=</sup> *μf,d* <sup>+</sup> *μf,c*, where *μf,d* is a countable sum of Dirac measures and *μf,c* is continuous with respect to the Lebesgue measure. *μf,c* can be composed both of an absolutely continuous part and a singular continuous part. The decomposition of *μf* suggests the orthogonal decomposition <sup>H</sup>*f,U* <sup>=</sup> *<sup>L</sup>*2*(*R*,dμf,d )* E*L*2*(*R*,dμf,c)*. In sum, we have

$$f = f\_{NU} + f\_d + f\_c,\tag{22}$$

where *fNU* <sup>∈</sup> *<sup>L</sup>*2*(*[0*,*∞]*,* <sup>H</sup>*f,*0*)*, *fd* <sup>∈</sup> *<sup>L</sup>*2*(*R*,dμf,d )*, and *fc* <sup>∈</sup> *<sup>L</sup>*2*(*R*,dμf,c)*. Note that these subspaces are pair-wise orthogonal and are all invariant under <sup>K</sup>*<sup>s</sup>* for all *s* ≥ 0. The support of *μf,d* consists of countably many points. Each point *xi* in the support of *μf,d* corresponds to an eigenvector *vi* <sup>∈</sup> <sup>H</sup>*<sup>f</sup>* of <sup>K</sup>*<sup>s</sup>* for all *<sup>s</sup>* <sup>≥</sup> 0, i.e.

$$(\phi(a\_l v\_l))(\mathbf{x}) = \begin{cases} 1 & \text{if } \mathbf{x} = \mathbf{x}\_l, \\ 0 & \text{otherwise,} \end{cases} \tag{23}$$

and *μf,d (*{*xi*}*)* = |*ai*| 2, where *ai*'s are the coefficients of the eigenvectors in the following decomposition:

$$f = \sum\_{l} a\_{l}v\_{l} + f\_{NU} + f\_{c}.\tag{24}$$

We rearrange the index of *vi* so that |*a*1|≥|*a*2| ≥ ··· ≥ 0. In order to make connection with *Aτ* , we need the following lemmas.

**Lemma 1** *For any τ >* <sup>0</sup> *and any <sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*[0*, τ* ]*), the following integral*

$$\int\_0^\tau \mathbf{g}(\mathbf{s}) \mathcal{K}^s f \, d\mathbf{s} \tag{25}$$

*is well-defined and is an element of* H*<sup>f</sup> .*

The proof of this and the following lemma use standard argument from mathematical analysis and we leave the proof to the interested readers.

Let

$$\widetilde{\mathcal{H}}\_f^{\text{int}} = \{ \int\_0^\tau g(s) \mathcal{K}^s f \, \text{ds} \, : \, \tau > 0, \, g \in L^2([0, \tau]) \}. \tag{26}$$

<sup>H</sup>'int *<sup>f</sup>* is a linear subspace of H*<sup>f</sup>* . We have

#### **Lemma 2**

$$
\overline{\mathcal{H}\_f^{\text{int}}} = \mathcal{H}\_f.\tag{27}
$$

For simplicity, we use the notation *L*<sup>2</sup> *<sup>τ</sup>* := *<sup>L</sup>*2*(*[0*, τ* ]*)*. Given Lemma 1, for any *<sup>g</sup>*1*, g*<sup>2</sup> <sup>∈</sup> *<sup>L</sup>*2*(*[0*, τ* ]*)* and *<sup>t</sup>* ∈ [0*, τ* ], we define the Hermitian form **<sup>A</sup>***<sup>τ</sup>* : *<sup>L</sup>*<sup>2</sup> *τ*×*L*<sup>2</sup> *<sup>τ</sup>* <sup>→</sup> <sup>C</sup>:

$$\mathbf{A}\_{\mathbf{f}}(\mathbf{g}\_1, \mathbf{g}\_2) = \frac{1}{\pi} \Big\langle \int\_0^\tau \mathbf{g}\_1(t) \mathcal{K}^t f \, \mathrm{d}t, \int\_0^\tau \mathbf{g}\_2(s) \mathcal{K}^s f \, \mathrm{d}s \bigg\rangle\_{\mathcal{H}\_f}.\tag{28}$$

Cauchy-Schwartz inequality implies that

$$\left|\mathbf{A}\_{\mathbf{f}}(\mathbf{g}\_{1},\mathbf{g}\_{2})\right|^{2} \leq \frac{1}{\pi^{2}} \left\| \int\_{0}^{\mathbf{r}} \mathbf{g}\_{1}(\mathbf{s}) \mathcal{K}^{\mathbf{s}} f \operatorname{\mathbf{ds}} \right\|\_{\mathcal{H}\_{f}}^{2} \left\| \int\_{0}^{\mathbf{r}} \mathbf{g}\_{2}(\mathbf{s}) \mathcal{K}^{\mathbf{s}} f \operatorname{\mathbf{ds}} \right\|^{2} \tag{29}$$

$$\leq \frac{1}{\mathfrak{r}^2} \left\| \lg \left\| \right\|\_{L^2\_{\mathfrak{r}}}^2 \left\| \lg 2 \right\|\_{L^2\_{\mathfrak{r}}}^2 \left\| f \right\|\_{\mathcal{H}\_f}^4,\tag{30}$$

where *, <sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* refers to the inner product in *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* and *,* <sup>H</sup>*<sup>f</sup>* refers to the inner product in H*<sup>f</sup>* . Therefore Riesz representation theorem warrants that there exists a linear bounded operator *Aτ* : *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* <sup>→</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* so that **A***<sup>τ</sup> (g*1*, g*2*)* = *g*1*, Aτ g*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> *τ* . Consequently,

$$g(A\_{\mathbb{T}}g)(t) = \frac{1}{\pi} \Big| \int\_0^\pi g(s) \mathcal{K}^s f \, \mathrm{d}s, \mathcal{K}^t f \Big|\_{\mathcal{H}\_f} = \frac{1}{\pi} \int\_0^\pi g(s) \rho(t-s) \mathrm{d}s,\tag{31}$$

which is the same as the definition of *Aτ* in [3]. Assumption 2 implies that *ρ* ∈ *L*2 loc*(*R*)*. This implies that *Aτ* is a Hilbert–Schmidt operator on *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* . We shall use the following variational description of the eigenvalues.

**Proposition 1 (The Min-Max Principle)** *Let* H *be a Hilbert space and A a Hermitian operator on* H*. Let λ*<sup>1</sup> ≥ *λ*<sup>2</sup> ≥··· *be the eigenvalues of A in descending order. Then*

$$\lambda\_l = \max\_{\substack{\mathcal{M} \subset \mathcal{H} \\ \text{dim } \mathcal{M} = l}} \min\_{\substack{\upsilon \in \mathcal{M} \\ \text{|}\upsilon\|^2}} \frac{\langle \upsilon, A\upsilon\rangle}{\|\upsilon\|^2} \tag{32}$$

Our main result states that,

**Theorem 3 (Main Result)** *Under Assumptions <sup>1</sup> and 2, we have, for all <sup>i</sup>* <sup>∈</sup> <sup>N</sup>

$$\lim\_{\tau \to \infty} \lambda\_{\tau, i} = |a\_i|^2,\tag{33}$$

*where λτ,i stands for the eigenvalues of Aτ .*

The following Proposition [4] demonstrates the correspondence between the eigenfrequencies of the continuous-time time-shift operator and the discrete-time time-shift operator. Please refer to [4] for the notations in the proposition.

**Proposition 2** *Let* {*f (Xt)* : *t* ≥ 0} *be a continuous time process for which ρs exists for all s* ≥ 0*. Let Δt >* 0 *be a time step. Assume that*

$$\lim\_{T \to \infty} \frac{1}{T} \int\_{0}^{T} f(X\_{l})\bar{f}(X\_{l+k\Delta t})dt$$

$$= \lim\_{T \to \infty} \frac{\Delta t}{T} \sum\_{N \ni n=0}^{T/\Delta t} f(X\_{n\Delta t})\bar{f}(X\_{(n+k)\Delta t}),\tag{34}$$

*for all <sup>k</sup>* <sup>∈</sup> <sup>N</sup>*. Then* <sup>H</sup>*<sup>f</sup>* <sup>→</sup> <sup>H</sup>*cont <sup>f</sup> . Let q be an eigenfrequency of the discrete-time operator* <sup>K</sup>*Δt , i.e. there exists <sup>h</sup>* <sup>∈</sup> <sup>H</sup>*<sup>f</sup>* <sup>→</sup> <sup>H</sup>*cont <sup>f</sup> so that* <sup>K</sup>*Δth* <sup>=</sup> *<sup>e</sup>iqh. Then there exists an integer <sup>k</sup>, and hk* <sup>∈</sup> <sup>H</sup>*cont <sup>f</sup> , so that*

$$\mathcal{K}^s h\_k = e^{i\frac{q + 2k\pi}{\Delta t}s} h\_k \tag{35}$$

*for all s* ≥ 0*.*

*Remark 1* It is worth to point out that the one-parameter semigroup of isometries {K*<sup>s</sup>* : *<sup>s</sup>* <sup>≥</sup> <sup>0</sup>} is equivalent to the classical Koopman one-parameter semigroup {K˜ *<sup>s</sup>* : *<sup>s</sup>* <sup>≥</sup> <sup>0</sup>} which acts on *<sup>L</sup>*2*(X, dν)* almost surely (with respect to the initial state of the time series), if the dynamical system is ergodic and has finite invariant measure *<sup>ν</sup>* on the phase space *<sup>X</sup>*. Because if *<sup>f</sup>* <sup>∈</sup> *<sup>L</sup>*2*(X, ν)*, then *<sup>f</sup>* <sup>K</sup>˜ *sf*¯ <sup>∈</sup> *<sup>L</sup>*1*(X, dν)* and Birkhoff ergodic theorem states that *ρ(s)* <sup>=</sup> *ν(f* <sup>K</sup>˜ *sf )*¯ for almost every initial state *<sup>x</sup>*<sup>0</sup> <sup>∈</sup> *<sup>X</sup>*. In other words, *f,* <sup>K</sup>*sf* <sup>H</sup>*<sup>f</sup>* <sup>=</sup>*f,* <sup>K</sup>˜ *sf <sup>L</sup>*2*(X,dν)*. Note that *<sup>f</sup>* is interpreted as a given time series on the left of the equality and interpreted as a function on the right of the equality. This shows that under the assumption that the dynamical system is ergodic and (finite) measure-preserving, there is a natural isometric bijection from <sup>H</sup>*<sup>f</sup>* to *<sup>L</sup>*2*(X, dν)*.

For mathematical interests, we present the main result in an abstract mathematical form.

**Theorem 4 (Main Result in Mathematical Form)** *Let* H *be a Hilbert space and* {K*<sup>s</sup>* : *<sup>s</sup>* <sup>≥</sup> <sup>0</sup>} *a strongly continuous one-parameter semigroup of isometries acting on* <sup>H</sup>*. For any <sup>f</sup>* <sup>∈</sup> <sup>H</sup>*, let <sup>f</sup>* <sup>=</sup> *i aivi* + *f* <sup>⊥</sup>*, where vi's are the common*

*eigenvectors of* <sup>K</sup>*<sup>s</sup> for all <sup>s</sup>* <sup>≥</sup> <sup>0</sup>*, and <sup>f</sup>* <sup>⊥</sup> *is the component of <sup>f</sup> that is orthogonal to the eigenspace of* <sup>K</sup>*<sup>s</sup> for all <sup>s</sup>* <sup>≥</sup> <sup>0</sup>*. Assume that* <sup>|</sup>*a*1|≥|*a*2| ≥ ··· ≥ <sup>0</sup>*. For any τ >* <sup>0</sup>*, let Af,τ be the Hermitian operator on <sup>L</sup>*2*(*[0*, τ* ]*), such that for any <sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*[0*, τ* ]*) and any <sup>t</sup>* ∈ [0*, τ* ]*,*

$$(A\_{f, \mathbf{t}} \mathbf{g})(t) = \frac{1}{\pi} \int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}) \langle \mathcal{K}^s f, \mathcal{K}^t f \rangle\_{\mathcal{H}} \mathbf{ds}.\tag{36}$$

*Then Af,τ is a Hilbert–Schmidt operator and hence has purely punctual spectrum. Let λf,τ,i be the i-th largest eigenvalue of Af,τ . Then we have*

$$\lim\_{\mathfrak{r}\to\infty} \lambda\_{f,\mathfrak{r},\vec{\iota}} = |a\_{\vec{\iota}}|^2. \tag{37}$$

#### **3 Proof of the Main Result**

For any fixed small <sup>≥</sup> 0, we choose *<sup>k</sup>*, so that <sup>∞</sup> *i*=*k*+1 |*ai*| <sup>2</sup> <sup>≤</sup> . We have the orthogonal decomposition

$$f = f\_d + f\_{NU} + f\_c = \sum\_{l=1}^{k} a\_l v\_l + \sum\_{l=k+1}^{\infty} a\_l v\_l + f\_{d,k} + f\_{NU} + f\_c$$
 
$$= f\_{d,k} + f\_{d,\epsilon} + f\_{NU} + f\_c,\tag{38}$$

where *fd,k* ∈ H*f,d,k* which is the subspace of H*f,d* spanned by {*v*1*,...,vk*}, and *fd,* ∈ H*f,d,* the subspace spanned by the rest of the eigenvectors, *fNU* ∈ H*f,NU* , and *fc* ∈ H*f,c*. Note that H*f,d,k*, H*f,d,* , H*f,NU* , and H*f,c* are pairwise orthogonal and invariant subspaces of <sup>H</sup>*<sup>f</sup>* . Hence following Eq. (28), for any *<sup>g</sup>*1*, g*<sup>2</sup> <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *τ* ,

*g*1*, Aτ g*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* <sup>=</sup> <sup>1</sup> *τ* 6 *τ* 0 *<sup>g</sup>*1*(s)*K*<sup>s</sup> <sup>f</sup>* <sup>d</sup>*s, τ* 0 *<sup>g</sup>*2*(t)*K*<sup>t</sup> f* d*t* <sup>H</sup>*<sup>f</sup>* = 1 *τ* 6 *τ* 0 *<sup>g</sup>*1*(s)*K*<sup>s</sup> (fd,k* <sup>+</sup> *fd,* <sup>+</sup> *fc* <sup>+</sup> *fNU )*d*s, τ* 0 *<sup>g</sup>*2*(t)*K*<sup>t</sup> (fd,k* + *fd,* + *fc* + *fNU )*d*t* = 1 *τ* 6 *τ* 0 *<sup>g</sup>*1*(s)*K*<sup>s</sup> fd,k*d*s, τ* 0 *<sup>g</sup>*2*(t)*K*<sup>t</sup> fd,k*d*t* 7 H*f* + 1 *τ* 6 *τ* 0 *<sup>g</sup>*1*(s)*K*<sup>s</sup> fd,*d*s, τ* 0 *<sup>g</sup>*2*(t)*K*<sup>t</sup> fd,*d*t* 7 H*f* + 1 *τ* 6 *τ* 0 *<sup>g</sup>*1*(s)*K*<sup>s</sup> fc*d*s, τ* 0 *<sup>g</sup>*2*(t)*K*<sup>t</sup> fc*d*t* 7 H*f* + 1 *τ* 6 *τ* 0 *<sup>g</sup>*1*(s)*K*<sup>s</sup> fNU* <sup>d</sup>*s, τ* 0 *<sup>g</sup>*2*(t)*K*<sup>t</sup> fNU* d*t* 7 H*f* =*g*1*, Aτ,d,kg*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* + *g*1*, Aτ,d,g*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* + *g*1*, Aτ,cg*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* + *g*1*, Aτ,NU g*<sup>2</sup> *<sup>L</sup>*<sup>2</sup> *τ ,* (39)

in which the definition of *Aτ,d,k*, *Aτ,d,* , *Aτ,c* and *Aτ,NU* are obvious. It is not hard to show that *Aτ,d,k*, *Aτ,d,* , *Aτ,c* and *Aτ,NU* all admit eigendecomposition since they are all Hilbert–Schmidt Hermitian operators. Note that the cross product terms all as <sup>H</sup>*f,d,k*, <sup>H</sup>*f,d,* , <sup>H</sup>*f,c* and <sup>H</sup>*f,NU* are pairwise orthogonal and invariant under <sup>K</sup>*<sup>s</sup>* for all *s* ≥ 0.

Let *λτd ,k,i*, *λτ,d,,i*, *λτ,c,i*, and *λτ,NU,i* be the *i*-th largest eigenvalue of *Aτ,d,k*, *Aτ,d,* , *Aτ,c*, *Aτ,NU* respectively. We will prove the following identities:

#### **Proposition 3**

$$\lim\_{\tau \to \infty} \lambda\_{\tau, d, k, l} = |a\_l|^2 \text{ for } i = 1, \dots, k,\tag{40}$$

$$
\lambda\_{\mathfrak{r},d,\epsilon,\mathfrak{l}} \le \epsilon \, for \, any \; \mathfrak{r} > 0,\tag{41}
$$

$$\lim\_{\tau \to \infty} \lambda\_{\tau, c, 1} = 0,\tag{42}$$

$$\lim\_{\mathbf{r}\to\infty} \lambda\_{\mathbf{r},NU,\mathbf{l}} = 0.\tag{43}$$

Before we start to prove Eqs. (40)–(43), it is not hard to see that Propositions 1 and 3 directly implies the main result. Indeed, for any fixed *n* and any  *>* 0, we can find

*<sup>k</sup>* so that *<sup>n</sup>* <sup>≤</sup> *<sup>k</sup>* and <sup>∞</sup> *i*=*k*+1 |*ai*| <sup>2</sup> <sup>≤</sup> . Then we find *<sup>τ</sup>* large enough so that *λτ,c,*<sup>1</sup> <sup>≤</sup>

and *λτ,NU,*<sup>1</sup> ≤ . Note that *Aτ,d,k*, *Aτ,d,* , *Aτ,c*, and *Aτ,NU* are all positive semidefinite. Applying the min-max principle we have

$$\lambda\_{\mathbf{r},n} = \max\_{\substack{\mathcal{M} \subset L^2\_{\mathbf{r}} \\ \dim \mathcal{M} = n}} \min\_{\substack{\upsilon \in \mathcal{M} \\ \|\upsilon\|^2 \\ \dim \mathcal{M} = n}} \frac{\langle \upsilon, A\_{\mathbf{r},\upsilon} \upsilon \rangle}{\|\upsilon\|^2} \tag{44}$$

$$= \max\_{\substack{\mathcal{M} \subset L^2\_{\mathbf{r}} \\ \dim \mathcal{M} = n}} \min\_{\upsilon \in \mathcal{M}} \frac{\langle \upsilon, A\_{\mathbf{r},d,k} \upsilon \rangle + \langle \upsilon, A\_{\mathbf{r},d,\epsilon} \upsilon \rangle + \langle \upsilon, A\_{\mathbf{r},c} \upsilon \rangle + \langle \upsilon, A\_{\mathbf{r},NU} \upsilon \rangle}{\|\upsilon\|^2}$$

$$\geq \max\_{\substack{\mathcal{M} \subset L^2\_{\mathbf{r}} \\ \dim \mathcal{M} = n}} \min\_{\upsilon \in \mathcal{M}} \frac{\langle \upsilon, A\_{\mathbf{r},d,k} | \upsilon \rangle}{\|\upsilon\|^2} = \lambda\_{\mathbf{r},d,k,n},\tag{46}$$

and that

$$\lambda\_{\tau,n} = \max\_{\substack{\mathcal{M} \subset L\_{\tau}^2 \\ \dim \mathcal{M} = n}} \min\_{\substack{\upsilon \in \mathcal{M} \\ \|\upsilon\|^2 \\ \end{pmatrix}} \frac{\langle \upsilon, A\_{\tau} | \upsilon \rangle}{\|\upsilon\|^2} \tag{47}$$

$$\leq \max\_{\substack{\mathcal{M} \subset L^{2}\_{\mathbf{r}} \\ \text{dim}\,\mathcal{M} = n}} \min\_{\substack{\boldsymbol{v} \in \mathcal{M} \\ \boldsymbol{\nu}}} \frac{\langle \boldsymbol{v}, \boldsymbol{A}\_{\mathbf{r},d,k} \boldsymbol{v} \rangle}{\|\boldsymbol{v}\|^{2}} + 2\epsilon = \lambda\_{\mathbf{r},d,k,n} + 2\epsilon. \tag{48}$$

Combined with Eq. (40), this implies Theorem 3.

*Proof (Equation* (40)*)* Recall from Eq. (23) that each eigenvector *vi* corresponds to a point *xi* in the support of *μd* . For any *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* , Theorem 2 states that *τ* 0 *g(s)*K*<sup>s</sup> fd,k*d*<sup>s</sup>* has the following representation in *<sup>L</sup>*2*(*R*, dμ)*, for any *<sup>x</sup>* <sup>∈</sup> <sup>R</sup>,

Koopman and Time-Series 311

$$\left(\phi\left(\int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}) \mathcal{K}^s f\_{d,k} \mathbf{ds}\right)\right)(\mathbf{x}) = \begin{cases} \int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}) e^{i\mathbf{x}\mathbf{x}}/\mathbf{ds} & \text{, if } \mathbf{x} = \mathbf{x}\_j \text{ for some } j. \\ 0 & \text{, otherwise.} \end{cases} \tag{49}$$

And

$$\langle \langle g, A\_{\mathbf{r},d,k} g \rangle\_{L^2\_{\mathbf{r}}} = \frac{1}{\pi} \langle \int\_0^\mathbf{r} g(s) \mathcal{K}^s f\_{d,k} \mathrm{d}s, \int\_0^\mathbf{r} g(t) \mathcal{K}^t f\_{d,k} \mathrm{d}t \rangle\_{\mathcal{H}\_f} \tag{50}$$

$$=\frac{1}{\pi} \sum\_{j=1}^{k} \left\| \int\_{0}^{\mathbf{r}} \mathbf{g}(\mathbf{s}) e^{i s \boldsymbol{\chi}\_{j}} d\mathbf{s} \right\|\_{L^{2}(\mathbb{R}, d\mu)}^{2} \tag{51}$$

$$=\frac{1}{\pi}\sum\_{j=1}^{k}|a\_{j}|^{2}\Big|\int\_{0}^{\pi}g(s)e^{is\chi}ds\Big|^{2}.\tag{52}$$

Let *ξj* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* so that *ξj (s)* <sup>=</sup> *<sup>e</sup>isxj* for any *<sup>s</sup>* ∈ [0*, τ* ]. Then *ξj* <sup>2</sup> *L*2 *τ* = *τ* and

$$\langle \langle \mathbf{g}, A\_{\mathbf{r},d,k} \cdot \mathbf{g} \rangle\_{L^2\_{\mathbf{r}}} = \frac{1}{\pi} \sum\_{j=1}^{k} |a\_j|^2 |\langle \xi\_j, \mathbf{g} \rangle\_{L^2\_{\mathbf{r}}}|^2 = \sum\_{j=1}^{k} |\langle \frac{a\_j \xi\_j}{\sqrt{\pi}}, \mathbf{g} \rangle\_{L^2\_{\mathbf{r}}}|^2 \tag{53}$$

Let *Vτ,k* <sup>=</sup> SpanC{ *<sup>a</sup>* √1*ξ*1 *<sup>τ</sup> , <sup>a</sup>* √2*ξ*2 *<sup>τ</sup> ,* ··· *, <sup>a</sup>* √*k ξk <sup>τ</sup>* }. We write *g* = *gτ,k*+*g*⊥, where *gτ,k* ∈ *Vτ,k*, and *g*<sup>⊥</sup> ∈ *V* <sup>⊥</sup> *τ,k*. Then

$$\langle \langle g, A\_{\tau,k,d} \ g \rangle\_{L^2\_{\mathbb{T}}} = \sum\_{j=1}^k |\langle \frac{a\_j \xi\_j}{\sqrt{\pi}}, g\_{\tau,k} \rangle|\_{L^2\_{\mathbb{T}}}^2. \tag{54}$$

Note that dim *Vτ,k* = *k* for all *τ >* 0. Direct calculation yields that, for *j* = *!*, *aj ξj* <sup>√</sup>*<sup>τ</sup> , <sup>a</sup>* √*!ξ! <sup>τ</sup> <sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* = *aj a*¯*<sup>l</sup> e i(xj* <sup>−</sup>*x!)τ* <sup>−</sup><sup>1</sup> *iτ (xj*−*x!)* <sup>→</sup> 0 as *<sup>τ</sup>* → ∞. Therefore the distribution of the eigenvalues of *Aτ,k,d* shall approach to the distribution of the eigenvalues of

$$
\begin{pmatrix}
0 & |a\_2|^2 \cdots & 0 \\
\vdots \\
0 & 0 & \cdots & |a\_k|^2
\end{pmatrix}
\tag{55}
$$

as *τ* → ∞. This completes the proof of Eq. (40).

*Proof (Equation* (41)*)* Similar to Eq. (53), for any *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *<sup>τ</sup>* , *gL*<sup>2</sup> *<sup>τ</sup>* = 1, we have 312 Y. Zhen et al.

$$\langle \langle g, A\_{\mathbf{r}, d, \epsilon} \cdot g \rangle\_{L^2\_{\mathbf{r}}} = \frac{1}{\pi} \sum\_{j=k+1}^{\infty} |a\_j|^2 |\langle \xi\_j, g \rangle\_{L^2\_{\mathbf{r}}}|^2 = \sum\_{j=k+1}^{\infty} |\langle \frac{a\_j \xi\_j}{\sqrt{\pi}}, g \rangle\_{L^2\_{\mathbf{r}}}|^2 \tag{56}$$

$$\epsilon \le \sum\_{j=k+1}^{\infty} |a\_j|^2 \le \epsilon. \tag{57}$$

Then the min-max principle implies that *λτ,k,,*<sup>1</sup> ≤ .

*Proof (Equation* (42)*)* Following [1] (page 39–41), we first show that

$$\lim\_{\mathfrak{r}\to\infty} \frac{1}{\mathfrak{r}} \int\_0^\mathfrak{r} \left| \mu\_{f,c}(e^{is\chi}) \right| \mathrm{d}s = 0,\tag{58}$$

or equivalently

$$\lim\_{\mathfrak{r}\to\infty} \frac{1}{\mathfrak{r}} \int\_0^\pi \left| \mu\_{f,c}(e^{is\chi}) \right|^2 \mathrm{d}s = 0. \tag{59}$$

Equation (58) means that the large moments associated to the continuous spectral measure has density zero. For any  *>* 0, we write *μf,c* = *μf,c,*<sup>1</sup> + *μf,c,* , in which *μf,c,*<sup>1</sup> has compact support, *μf,c, (*R*)<* and *μf,c,*<sup>1</sup> <sup>⊥</sup> *μf,c,* . Denote the support of *μf,c,*<sup>1</sup> by *B*1. Then we have

$$\frac{1}{\tau} \int\_0^\tau \left| \mu\_{f,\epsilon}(e^{isx}) \right|^2 \mathrm{d}\mathbf{s} = \frac{1}{\tau} \int\_0^\tau \left| \mu\_{f,c,1}(e^{isx}) \right|^2 \mathrm{d}\mathbf{s} + \frac{1}{\tau} \int\_0^\tau \left| \mu\_{f,c,\epsilon}(e^{isx}) \right|^2 \mathrm{d}\mathbf{s} \tag{60}$$

$$<\frac{1}{\pi} \int\_0^\pi \left| \int\_{\mathbb{R}} e^{isx} \mathrm{d}\mu\_{f,c,1}(x) \right|^2 \mathrm{d}s + \epsilon \tag{61}$$

and that

$$\frac{1}{\pi} \int\_0^\pi \left| \mu\_{f,c,1}(e^{isx}) \right|^2 \mathrm{d}s = \frac{1}{\pi} \int\_0^\pi \left| \int\_{\mathbb{R}} e^{isx} \mathrm{d}\mu\_{f,c,1}(\mathbf{x}) \right|^2 \mathrm{d}s$$

$$= \frac{1}{\pi} \int\_0^\pi \mathrm{d}s \int\_{\mathbb{R}} \int\_{\mathbb{R}} e^{is(\mathbf{x}-\mathbf{y})} \mathrm{d}\mu\_{f,c,1}(\mathbf{x}) \mathrm{d}\mu\_{f,c,1}(\mathbf{y}) \tag{62}$$

$$=\frac{1}{\pi}\int\_{\mathbb{R}}\int\_{\mathbb{R}}\mathsf{d}\mu\_{f,c,1}(\mathsf{x})\mathsf{d}\mu\_{f,c,1}(\mathsf{y})\int\_{0}^{\mathsf{r}}e^{i\mathbf{s}(\mathsf{x}-\mathsf{y})}\mathsf{ds}\tag{63}$$

$$=\frac{1}{\pi}\int\_{B\_1}\int\_{B\_1} \mathbf{d}\mu\_{f,c,1}(\mathbf{x})\mathbf{d}\mu\_{f,c,1}(\mathbf{y})\int\_0^{\pi} e^{l s(\mathbf{x}-\mathbf{y})}\,\mathrm{d}s}\,\tag{64}$$

Note that 1 *τ* ! *τ* <sup>0</sup> *<sup>e</sup>is(x*−*y)*d*<sup>s</sup>* <sup>≤</sup> 1 for any *τ >* 0 and any *x, y* <sup>∈</sup> <sup>R</sup>. And when *<sup>x</sup>*<sup>=</sup> *<sup>y</sup>* Koopman and Time-Series 313

$$1 \geq \left| \frac{1}{\tau} \int\_0^\tau e^{is(\chi - y)} \mathrm{d}s \right| = \left| \frac{e^{i\tau(\chi - y)} - 1}{\tau i(\chi - y)} \right| \underset{\tau \to \infty}{\longrightarrow} 0. \tag{65}$$

Since *μf,c,*<sup>1</sup> is continuous, we have that *(μf,c,*<sup>1</sup> <sup>×</sup> *μf,c,*1*)(*{*(x, y)* <sup>∈</sup> <sup>R</sup><sup>2</sup> : *<sup>x</sup>* <sup>=</sup> *<sup>y</sup>*}*)* <sup>=</sup> 0. Hence, the integral in Eq. (64) boils down to an integral on <sup>R</sup><sup>2</sup> \ {*<sup>x</sup>* <sup>=</sup> *y*}. Lebesgue's dominated convergence theorem implies that the integral in Eq.(64) converges to 0 as *<sup>τ</sup>* → ∞. Hence lim sup *<sup>τ</sup>*→∞ 1 *τ τ* 0 *μf,c(eisx )* 2 d*s<* for any  *>* 0. This implies Eq. (59).

For any *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*2*(*R*)*, Theorem <sup>2</sup> implies that

$$\left(\phi\left(\int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}) \mathcal{K}^s f\_{d,c} \mathbf{d} \mathbf{s}\right)\right)(\mathbf{x}) = \int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}) e^{i\mathbf{x}\mathbf{x}} \mathbf{d} \mathbf{s}.\tag{66}$$

Therefore

$$<\langle g, A\_{\mathbf{r},c} \ g \rangle\_{L^2\_{\mathbf{r}}} = \frac{1}{\pi} \langle \int\_0^\mathbf{r} g(\mathbf{s}) \mathcal{K}^s f\_{d,c} \mathbf{d} \mathbf{s}, \int\_0^\mathbf{r} g(\mathbf{t}) \mathcal{K}^t f\_{d,c} \mathbf{d} \mathbf{t} \Big|\_{\mathcal{H}\_f} \tag{67}$$

$$=\frac{1}{\pi}\Big|\phi\Big(\int\_0^\pi g(s)\mathcal{K}^s f\_{d,c} \mathrm{d}s\Big),\phi\Big(\int\_0^\pi g(t)\mathcal{K}^t f\_{d,c} \mathrm{d}t\Big)\Big|\_{L^2(\mathbb{R},d\mu\_c)}\Big)\Big|\_{L^2(\mathbb{R},d\mu\_c)}\tag{68}$$

$$=\frac{1}{\pi}\int\_{-\infty}^{\infty} \mathrm{d}\mu\_{f,c}(\mathbf{x}) \int\_{0}^{\tau} \int\_{0}^{\tau} \mathbf{g}(\mathbf{s})\bar{\mathbf{g}}(t)e^{l(s-t)\mathbf{x}} \,\mathrm{d}\mathbf{s} \,\mathrm{d}t \tag{69}$$

$$=\frac{1}{\pi}\int\_{0}^{\pi}\int\_{0}^{\pi}g(s)\bar{g}(t)\mu\_{f,c}\left(e^{l(s-t)\chi}\right)\mathrm{d}s\,\mathrm{d}t\tag{70}$$

Hence

$$|\langle \mathbf{g}, A\_{\mathbf{f}, \mathbf{c}} \mathbf{g} \rangle| \le \frac{1}{\pi} \int\_0^\pi \int\_0^\pi |\mathbf{g}(t)| \cdot |\mathbf{g}(s)| \cdot |\mu\_{f, \mathbf{c}} \ (e^{l(s-t)\chi})| \mathrm{d}t \, \mathrm{d}s \tag{71}$$

$$=\frac{1}{\pi} \iint\_{0 \le s \le t \le \pi} |\mathbf{g}(t)| \cdot |\mathbf{g}(\mathbf{s})| \cdot |\boldsymbol{\mu}\_{f,\mathbf{c}} \ (e^{l(\mathbf{s}-t)\mathbf{x}})| \mathrm{d}t \, \mathrm{d}s \tag{72}$$

$$+\frac{1}{\pi} \iint\_{0 \le t \le s \le \mathfrak{r}} |\mathbf{g}(t)| \cdot |\mathbf{g}(\mathbf{s})| \cdot |\mu\_{f,\mathbf{c}} \ (e^{l(s-t)\chi})| \,\mathrm{d}t \,\mathrm{d}\mathbf{s} \tag{73}$$

$$=\frac{2}{\pi} \int\_0^\pi |\mathbf{g}(t)| \int\_I^\pi |\mathbf{g}(\mathbf{s})| \cdot |\mu\_{f,\mathbf{c}} \ (e^{i(\mathbf{s}-t)\mathbf{x}})| \,\mathrm{d}t \,\mathrm{d}\mathbf{s} \tag{74}$$

$$=\frac{2}{\pi} \int\_0^\pi |\mathbf{g}(t)| \int\_0^{\tau-t} |\mathbf{g}(t+s)| \cdot |\mu\_{f,c}(e^{isx})| \, \text{dsdt} \tag{75}$$

$$\leq \frac{2}{\pi} \int\_{0}^{\pi} \int\_{0}^{\pi - t} \frac{1}{2} (\left| \mathbf{g}(t) \right|^{2} + \left| \mathbf{g}(t + s) \right|^{2}) |\mu\_{f,c}(e^{isx})| \mathrm{d}s \mathrm{d}t \tag{76}$$

$$=\frac{1}{\pi}\int\_{0}^{\pi}|\mu\_{f,c}(e^{is\chi})|\int\_{0}^{\pi-s}(|g(t)|^{2}+|g(t+s)|^{2})\mathrm{d}s\mathrm{d}t\tag{77}$$

$$\leq \frac{1}{\pi} \int\_0^\pi 2|\mu\_{f,c}(e^{is\chi})| \cdot \|\mathbf{g}\|\_{L^2\_\pi}^2 \text{d}s \tag{78}$$

Therefore

$$\lambda\_{\mathbf{r},c,\mathbf{l}} = \max\_{\mathbf{g} \in L\_{\mathbf{r}}^2} \frac{\langle \mathbf{g}, A\_{\mathbf{r},c} \mathbf{g} \rangle}{\|\mathbf{g}\|\_{L\_{\mathbf{r}}^2}} \to 0,\tag{79}$$

as *τ* → ∞. This completes the proof of Eq. (42).

*Proof (Equation* (43)*)* Recall that <sup>H</sup>*f,NU* ∼= *<sup>L</sup>*2*(*[0*,* +∞]*,* <sup>H</sup>0*)*. Hence *fNU* can be represented as a curve from [0*,*∞] to H0. We denote this curve by *γ* . Without ambiguity, we do not distinguish between *γ* and *fNU* . Hence for each *t* ≥ 0, *γ (t)* ∈ <sup>H</sup>0. And *<sup>γ</sup>* <sup>2</sup> <sup>H</sup>*f,NU* = - ∞ 0 *γ (t)*<sup>2</sup> H0 <sup>d</sup>*t*. Recall that *(*K*sγ )(t)* <sup>=</sup> *γ (t* <sup>+</sup> *s)*. We set *γ (t)* <sup>=</sup> 0 for all *t <* 0. Hence for any *<sup>g</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>2</sup> *τ* ,

$$\langle \mathbf{g}, A\_{\mathbf{r}, NU} \ \mathbf{g} \rangle\_{L^2\_{\mathbf{r}}} = \frac{1}{\pi} \langle \int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}\_1) \mathcal{K}^{\mathcal{S}\_1} \mathcal{y} \, \mathrm{d}\mathbf{s}\_1, \int\_0^\mathbf{r} \mathbf{g}(\mathbf{s}\_2) \mathcal{K}^{\mathcal{S}\_2} \mathcal{y} \, \mathrm{d}\mathbf{s}\_2 \Big|\_{\mathcal{H}\_{f, NU}} \qquad (80)$$

$$=\frac{1}{\pi}\int\_{0}^{\infty}\int\_{0}^{\pi}\int\_{0}^{\pi}\bar{g}(s\_{2})g(s\_{1})\langle\chi(t+s\_{1}),\chi(t+s\_{2})\rangle\_{\mathcal{H}\_{0}}ds\_{1}ds\_{2}dt\tag{81}$$

$$=\frac{1}{\pi}\int\_{0}^{\pi}\int\_{0}^{\pi}\bar{\mathbf{g}}(s\_{2})\mathbf{g}\left(s\_{1}\right)\int\_{0}^{\infty}\langle\boldsymbol{\gamma}\,(t+s\_{1}),\boldsymbol{\gamma}\,(t+s\_{2})\rangle\_{\mathcal{H}\_{0}}\mathbf{d}t\,\mathrm{d}s\_{1}\mathrm{d}s\_{2}\tag{82}$$

We first show the following identity:

$$\lim\_{s \to \infty} \langle \boldsymbol{\eta}, \mathcal{K}^s \boldsymbol{\eta} \rangle \boldsymbol{\eta}\_{f, NU} = \lim\_{s \to \infty} \int\_0^\infty \langle \boldsymbol{\eta}(t), \boldsymbol{\eta}(t+s) \rangle \boldsymbol{\eta}\_0 \, \mathrm{d}t = 0. \tag{83}$$

To prove Eq. (83), without loss of generality we assume that *γ* H*f,NU* = 1. For any  *>* 0, there exists *N* , so that ! *N* <sup>0</sup> *γ (t)*<sup>2</sup> H0 d*t >* 1 − . This means that ! ∞ *N γ (t)*2d*t<*. Therefore for any *<sup>s</sup>* <sup>≥</sup> *N* ,

$$\left| \int\_{0}^{\infty} \langle \boldsymbol{\eta}(t), \boldsymbol{\eta}(t+s) \rangle \mathbb{1}\_{\mathcal{H}\_{0}} \mathrm{d}t \right|^{2} \leq \left| \int\_{0}^{\infty} \left\| \boldsymbol{\eta}(t) \right\|\_{\mathcal{H}\_{0}}^{2} \mathrm{d}t \right|^{2} \cdot \left| \int\_{N\_{\epsilon}}^{\infty} \left\| \boldsymbol{\eta}(t) \right\|\_{\mathcal{H}\_{0}}^{2} \mathrm{d}t \right|^{2} < \epsilon^{2}. \tag{84}$$

This proves Eq. (83). Now we continue with Eq. (82):

$$\langle \langle \mathbf{g}, A\_{\mathbf{r}, NU} \mathbf{g} \rangle\_{L^2\_{\mathbf{r}}} \le \left| \frac{2}{\pi} \int\_0^\pi \int\_{\mathcal{S}\_1}^\pi \tilde{\mathbf{g}}(\mathbf{s}\_2) \mathbf{g}(\mathbf{s}\_1) \langle \mathcal{K}^{\mathbf{s}\_1} \boldsymbol{\nu}, \mathcal{K}^{\mathbf{s}\_2} \boldsymbol{\nu} \rangle\_{\mathcal{H}\_{f, NU}} \, \mathrm{d} \mathbf{s}\_1 \mathrm{d} \mathbf{s}\_2 \right| \qquad (85)$$

$$\leq \Big| \frac{2}{\pi} \int\_{0}^{\pi} \int\_{0}^{\tau - s\_{1}} g(s\_{1}) \bar{\mathbf{g}}(s\_{1} + s) \langle \boldsymbol{\chi}, \boldsymbol{\mathcal{K}}^{s} \boldsymbol{\chi} \rangle\_{\mathcal{H}\_{f, \mathcal{M}U}} \operatorname{d}s\_{1} \operatorname{d}s \Big| \Big| \tag{86}$$

For any  *>* 0, find *M* , so that for any <sup>|</sup>*γ ,* <sup>K</sup>*sγ* <sup>|</sup> *<*  for any *s>M* . Now for any *τ>M/* and any *gL*<sup>2</sup> *<sup>τ</sup>* = 1, we have

$$\langle \lg, A\_{\mathfrak{r},NU} \mathfrak{g} \rangle\_{L^2\_{\mathfrak{r}}} \tag{87}$$

$$\leq \frac{2}{\pi} \int\_{0}^{\pi} \int\_{0}^{M\_{\epsilon}} |\mathbf{g}(\mathbf{s}\_{l})| \cdot |\mathbf{g}(\mathbf{s}\_{l} + \mathbf{s})| \cdot |\langle \boldsymbol{\upchi}, \mathcal{K}^{s} \boldsymbol{\upchi} \rangle\_{\mathcal{H}\_{f, \mathcal{M}U}}| \mathrm{d}s\_{l} \mathrm{d}s + \\ \tag{88}$$

$$\frac{2}{\pi} \int\_{0}^{\pi} \int\_{M\_{\epsilon}}^{\pi - s\_{1}} |g(s\_{1})| \cdot |g(s\_{1} + s)| \cdot |\langle \boldsymbol{\chi}, \boldsymbol{\mathcal{K}}^{s} \boldsymbol{\chi} \rangle \cdot \boldsymbol{\mathcal{H}}\_{f, NU}| \, \text{ds} \, \text{ds} \tag{89}$$

$$\leq \frac{1}{\pi} \int\_{0}^{\pi} \int\_{0}^{M\_{\epsilon}} (|g(\mathbf{s}\_{1})|^{2} + |\mathbf{g}(\mathbf{s}\_{1} + \mathbf{s})|^{2}) |\langle \boldsymbol{\upchi}, \mathcal{K}^{s} \boldsymbol{\upchi} \rangle\_{\mathcal{H}\_{f,NU}} |\operatorname{ds}\_{1} \operatorname{ds} + \tag{90}$$

$$\frac{1}{\pi} \int\_{0}^{\pi} \int\_{M\_{t}}^{\pi - s\_{1}} (\left| \mathbf{g}(\mathbf{s}\_{1}) \right|^{2} + \left| \mathbf{g}(\mathbf{s}\_{1} + \mathbf{s}) \right|^{2}) | \langle \boldsymbol{\upchi}, \boldsymbol{\upchi}^{s} \boldsymbol{\upchi} \rangle \boldsymbol{\upchi}\_{\mathcal{H}\_{f, \mathcal{U}U}} | \mathrm{d} \mathbf{s}\_{1} \mathrm{d} \mathbf{s} \tag{91}$$

$$\leq \frac{1}{\tau} \int\_{0}^{\tau} \int\_{0}^{M\_{\epsilon}} \left| \mathbf{g}(\mathbf{s}\_{1}) \right|^{2} \mathrm{d}\mathbf{s}\_{1} \mathrm{d}\mathbf{s} + \frac{1}{\tau} \int\_{0}^{\tau} \int\_{0}^{M\_{\epsilon}} \left| \mathbf{g}(\mathbf{s}\_{1} + \mathbf{s}) \right|^{2} \cdot \left| \langle \boldsymbol{\chi}, \boldsymbol{\mathcal{K}}^{\boldsymbol{s}} \boldsymbol{\chi} \rangle \boldsymbol{\upchi}\_{f, \mathcal{M}} \right| \, \qquad (92)$$

$$\mathrm{dds}\_{1}\mathrm{ds} + \frac{1}{\pi} \int\_{0}^{\pi} \int\_{M\_{\epsilon}}^{\pi - s\_{1}} \epsilon \left( |\mathbf{g}(s\_{1})|^{2} + |\mathbf{g}(s\_{1} + s)|^{2} \right) \mathrm{ds}\_{1} \mathrm{ds} \tag{93}$$

$$
\epsilon \le \frac{M\_{\epsilon}}{\tau} + \frac{M\_{\epsilon}}{\tau} + 2\frac{\epsilon}{\tau}(\tau - M\_{\epsilon}) \le 4\epsilon. \tag{94}
$$

Therefore for *τ>M/*,

$$\lambda\_{\mathfrak{r},NU,1} = \max\_{\substack{\mathfrak{g} \in L\_{\mathfrak{r}}^2 \\ \|\mathfrak{g}\| = 1}} \langle \mathfrak{g}, A\_{\mathfrak{r},NU} \mathfrak{g} \rangle \le 4\epsilon. \tag{95}$$

This completes the proof of Eq. (43).

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

## **Index**

#### **B** Boulvard, Pierre-Marie, 57

#### **C**

Chapron, Bertrand, 211, 223, 259, 301 Collard, Fabrice, 211 Crisan, Dan, 19, 43

#### **D**

Debussche, Arnaud, 15 Dinvay, Evgueni, 27 Dufée, Benjamin, 43

#### **F**

Fablet, Ronan, 211 Fiorini, Camilla, 57 Flandoli, Franco, 69

**G** Goodair, Daniel, 87

#### **H**

Hascoet, Erwan, 259 Holm, Darryl D., 109 Hug, Berenger, 15 Hu, Ruiao, 109, 135

#### **L**

Lang, Oana, 159 Li, Long, 57, 179, 273, 287 Lobbe, Alexander, 195 Luongo, Eliseo, 69

#### **M**

Mémin, Etienne, 15, 43, 57, 179, 273, 287, 301 Mensah, Prince Romeo, 1

#### **O**

Ouala, Said, 211

#### **P**

Pan, Wei, 159 Patching, Stuart, 135 Platzer, Paul, 223

#### **R**

Reich, Sebastian, 237 Resseguier, Valentin, 259

#### **S**

Street, Oliver D., 109

#### **T**

Tandeo, Pierre, 211, 223 Thiry, Louis, 273 Tissot, Gilles, 179 Tucciarone, Francesco L., 287

#### **Z**

Zhen, Yicun, 301